Re: [PATCH, GCC, AArch64] Fix PR88398 for AArch64

Sudakshina Das Fri, 15 Nov 2019 04:08:36 -0800

Hi Richard

I apologise I should have given more explanation on my cover letter.
Although the bug was filed for vectorization, the conversation on it talked
about loops with two exits not being supported in the vectorizer and being not
being possible without lto and peeling causing more harm than benefit.

There was also no clear consensus among the discussion about the best way to do
unrolling. So I looked at Wilco's suggestion of unrolling here
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398#c8
Although unroll_stupid does not exactly unroll it as he shows but
it gets closer than unroll_runtime_iterations. So I ran an experiment
to see if unrolliong the loop with unroll_stupid gets any benefit. The code
size benefit was easy to see with the small example but it also gave 
performance
benefit on Spec2017. The benefit comes because unroll_runtime_iteration adds
a switch case at the beginning for iteration check. This is less efficient
because it creates too many branch close together specially for a loop which
has more than 1 exit.
        beq     .L70
        cmp     x12, 1
        beq     .L55
        cmp     x12, 2
        beq     .L57
        cmp     x12, 3
        beq     .L59
        cmp     x12, 4
        beq     .L61
        cmp     x12, 5
        beq     .L63
        cmp     x12, 6
        bne     .L72

Finally I agree that unroll_stupid by default did not touch loops with multiple
exists but that was marked as a "TODO" to change later so I assumed that check
was not a hard requirement for the unrolling alghorithm.
/* Do not unroll loops with branches inside -- it increases number
     of mispredicts. 
     TODO: this heuristic needs tunning; call inside the loop body
     is also relatively good reason to not unroll.  */
unroll_stupid is also not touched unless there is -funroll-all-loops or a loop
pragma incidcating that maybe this could be potentially harmful on certain 
targets.
Since my experiments on AArch64 showed otherwise, I thought the easiest 
starting
point would be to do this in a target hook and only for a specific case
(multiple exits).

Thanks
Sudi








From: Richard Biener <richard.guent...@gmail.com>

Sent: Friday, November 15, 2019 9:32 AM

To: Sudakshina Das <sudi....@arm.com>

Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>; Kyrill Tkachov 
<kyrylo.tkac...@foss.arm.com>; James Greenhalgh <james.greenha...@arm.com>; 
Richard Earnshaw <richard.earns...@arm.com>; bin.ch...@linux.alibaba.com 
<bin.ch...@linux.alibaba.com>;
 o...@ucw.cz <o...@ucw.cz>

Subject: Re: [PATCH, GCC, AArch64] Fix PR88398 for AArch64
 

On Thu, Nov 14, 2019 at 4:41 PM Sudakshina Das <sudi....@arm.com> wrote:

>

> Hi

>

> This patch is trying to fix PR88398 for AArch64. As discussed in the PR,

> loop unrolling is probably what we can do here. As an easy fix, the

> existing unroll_stupid is unrolling the given example better than the

> unroll_runtime_iterations since the the loop contains a break inside it.


Hm, the bug reference doesn't help me at all in reviewing this - the bug

is about vectorization.


So why is unroll_stupid better than unroll_runtime_iterations for a loop

with a break (or as your implementation, with multiple exists)?


I don't like this target hook, it seems like general heuristics can be

improved here, but it seems unroll-stupid doesn't consider loops

with multiple exits at all?


Richard.


> So all I have done here is:

> 1) Add a target hook so that this is AArch64 specific.

> 2) We are not unrolling the loops that decide_unroll_runtime_iterations

> would reject.

> 3) Out of the ones that decide_unroll_runtime_iterations would accept,

> check if the loop has more than 1 exit (this is done in the new target

> hook) and if it does, try to unroll using unroll_stupid.

>

> Regression tested on AArch64 and added the test from the PR. This gives

> an overall code size reduction of 2.35% and performance gain of 0.498%

> on Spec2017 Intrate.

>

> Is this ok for trunk?

>

> Thanks

> Sudi

>

> gcc/ChangeLog:

>

> 2019-xx-xx  Sudakshina Das  <sudi....@arm.com>

>

>         PR88398

>         * cfgloop.h: Include target.h.

>         (lpt_dec): Move to...

>         * target.h (lpt_dec): ... Here.

>         * target.def: Define TARGET_LOOP_DECISION_ADJUST.

>         * loop-unroll.c (decide_unroll_runtime_iterations): Use new target 
> hook.

>         (decide_unroll_stupid): Likewise.

>         * config/aarch64/aarch64.c (aarch64_loop_decision_adjust): New 
> function.

>         (TARGET_LOOP_DECISION_ADJUST): Define for AArch64.

>         * doc/tm.texi: Regenerated.

>         * doc/tm.texi.in: Document TARGET_LOOP_DECISION_ADJUST.

>

> gcc/testsuite/ChangeLog:

>

> 2019-xx-xx  Sudakshina Das  <sudi....@arm.com>

>

>         PR88398

>         * gcc.target/aarch64/pr88398.c: New test.

Re: [PATCH, GCC, AArch64] Fix PR88398 for AArch64

Reply via email to