Hi Richard I apologise I should have given more explanation on my cover letter. Although the bug was filed for vectorization, the conversation on it talked about loops with two exits not being supported in the vectorizer and being not being possible without lto and peeling causing more harm than benefit. There was also no clear consensus among the discussion about the best way to do unrolling. So I looked at Wilco's suggestion of unrolling here https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398#c8 Although unroll_stupid does not exactly unroll it as he shows but it gets closer than unroll_runtime_iterations. So I ran an experiment to see if unrolliong the loop with unroll_stupid gets any benefit. The code size benefit was easy to see with the small example but it also gave performance benefit on Spec2017. The benefit comes because unroll_runtime_iteration adds a switch case at the beginning for iteration check. This is less efficient because it creates too many branch close together specially for a loop which has more than 1 exit. beq .L70 cmp x12, 1 beq .L55 cmp x12, 2 beq .L57 cmp x12, 3 beq .L59 cmp x12, 4 beq .L61 cmp x12, 5 beq .L63 cmp x12, 6 bne .L72 Finally I agree that unroll_stupid by default did not touch loops with multiple exists but that was marked as a "TODO" to change later so I assumed that check was not a hard requirement for the unrolling alghorithm. /* Do not unroll loops with branches inside -- it increases number of mispredicts. TODO: this heuristic needs tunning; call inside the loop body is also relatively good reason to not unroll. */ unroll_stupid is also not touched unless there is -funroll-all-loops or a loop pragma incidcating that maybe this could be potentially harmful on certain targets. Since my experiments on AArch64 showed otherwise, I thought the easiest starting point would be to do this in a target hook and only for a specific case (multiple exits). Thanks Sudi
From: Richard Biener <richard.guent...@gmail.com> Sent: Friday, November 15, 2019 9:32 AM To: Sudakshina Das <sudi....@arm.com> Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>; Kyrill Tkachov <kyrylo.tkac...@foss.arm.com>; James Greenhalgh <james.greenha...@arm.com>; Richard Earnshaw <richard.earns...@arm.com>; bin.ch...@linux.alibaba.com <bin.ch...@linux.alibaba.com>; o...@ucw.cz <o...@ucw.cz> Subject: Re: [PATCH, GCC, AArch64] Fix PR88398 for AArch64 On Thu, Nov 14, 2019 at 4:41 PM Sudakshina Das <sudi....@arm.com> wrote: > > Hi > > This patch is trying to fix PR88398 for AArch64. As discussed in the PR, > loop unrolling is probably what we can do here. As an easy fix, the > existing unroll_stupid is unrolling the given example better than the > unroll_runtime_iterations since the the loop contains a break inside it. Hm, the bug reference doesn't help me at all in reviewing this - the bug is about vectorization. So why is unroll_stupid better than unroll_runtime_iterations for a loop with a break (or as your implementation, with multiple exists)? I don't like this target hook, it seems like general heuristics can be improved here, but it seems unroll-stupid doesn't consider loops with multiple exits at all? Richard. > So all I have done here is: > 1) Add a target hook so that this is AArch64 specific. > 2) We are not unrolling the loops that decide_unroll_runtime_iterations > would reject. > 3) Out of the ones that decide_unroll_runtime_iterations would accept, > check if the loop has more than 1 exit (this is done in the new target > hook) and if it does, try to unroll using unroll_stupid. > > Regression tested on AArch64 and added the test from the PR. This gives > an overall code size reduction of 2.35% and performance gain of 0.498% > on Spec2017 Intrate. > > Is this ok for trunk? > > Thanks > Sudi > > gcc/ChangeLog: > > 2019-xx-xx Sudakshina Das <sudi....@arm.com> > > PR88398 > * cfgloop.h: Include target.h. > (lpt_dec): Move to... > * target.h (lpt_dec): ... Here. > * target.def: Define TARGET_LOOP_DECISION_ADJUST. > * loop-unroll.c (decide_unroll_runtime_iterations): Use new target > hook. > (decide_unroll_stupid): Likewise. > * config/aarch64/aarch64.c (aarch64_loop_decision_adjust): New > function. > (TARGET_LOOP_DECISION_ADJUST): Define for AArch64. > * doc/tm.texi: Regenerated. > * doc/tm.texi.in: Document TARGET_LOOP_DECISION_ADJUST. > > gcc/testsuite/ChangeLog: > > 2019-xx-xx Sudakshina Das <sudi....@arm.com> > > PR88398 > * gcc.target/aarch64/pr88398.c: New test.