will schmidt <will_schm...@vnet.ibm.com> writes: Thanks! > On Mon, 2020-07-06 at 15:13 +0800, guojiufu via Gcc-patches wrote: > > Hi, > > Assorted comments below. thanks :-) > >> For very small loops (< 6 insns), it would be fine to unroll 4 >> times to use cache line better. Like below loops: >> `while (i) a[--i] = NULL; while (p < e) *d++ = *p++;` >> >> And for very complex loops which may cause negative impacts: >> branch-miss or cache-miss. Like below loop: there are calls, >> early exits and branches in loop. >> ``` >> for (int i = 0; i < n; i++) { >> int e = a[I]; >> .... >> if (function_call(e)) break; >> .... >> } >> ``` >> >> This patch enhances RTL unroll for small loops and prevent to >> unroll complex loops. > > ok. > >> >> gcc/ChangeLog >> 2020-07-03 Jiufu Guo <guoji...@linux.ibm.com> >> >> * config/rs6000/rs6000.c (rs6000_loop_unroll_adjust): Refine hook. >> (rs6000_complex_loop_p): New function. >> (num_loop_calls): New function. > > Tabs versus spaces. oh, thanks!
> > (num_loop_calls): New function. > > >> --- >> gcc/config/rs6000/rs6000.c | 46 +++++++++++++++++++++++++++++++++--- >> -- >> 1 file changed, 40 insertions(+), 6 deletions(-) >> >> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c >> index 58f5d780603..a4874fa0efc 100644 >> --- a/gcc/config/rs6000/rs6000.c >> +++ b/gcc/config/rs6000/rs6000.c >> @@ -5130,22 +5130,56 @@ rs6000_destroy_cost_data (void *data) >> free (data); >> } >> >> +/* Count the number of call insns in LOOP. */ >> +static unsigned int >> +num_loop_calls (struct loop *loop) >> +{ >> + basic_block *bbs; >> + rtx_insn *insn; >> + unsigned int i; >> + unsigned int call_ins_num = 0; >> + >> + bbs = get_loop_body (loop); >> + for (i = 0; i < loop->num_nodes; i++) >> + FOR_BB_INSNS (bbs[i], insn) >> + if (CALL_P (insn)) >> + call_ins_num++; >> + >> + free (bbs); >> + >> + return call_ins_num; >> +} > > ok. > > >> + >> +/* Return true if LOOP is too complex to be unrolled. */ >> +static bool >> +rs6000_complex_loop_p (struct loop *loop) >> +{ >> + unsigned call_num; >> + >> + return loop->ninsns > 10 >> + && (call_num = num_loop_calls (loop)) > 0 >> + && (call_num + num_loop_branches (loop)) * 5 > loop->ninsns >> + && !single_exit (loop); >> +} >> + > > > The assignment to call_num within the logic there concerns me. I'd > break that out. > > The 5 value is not explicitly mentioned elsewhere. Contextually this > appears to be evaluating the ratio of branches versus instructions > within the loop. Could use some clarity. Yes, it is 20%. Would make it clarity. Thanks, > > > >> /* Implement targetm.loop_unroll_adjust. */ >> >> static unsigned >> rs6000_loop_unroll_adjust (unsigned nunroll, struct loop *loop) >> { >> - if (unroll_only_small_loops) >> + if (unroll_only_small_loops) > > indentation fix looks ok. > >> { >> - /* TODO: This is hardcoded to 10 right now. It can be refined, for >> - example we may want to unroll very small loops more times (4 perhaps). >> - We also should use a PARAM for this. */ > > Still hardcoded values, and may still wish to eventually have this as a > tunable param. Probably OK to drop the 2nd sentence, but first and > last sentences should probably stay. > > >> + if (loop->ninsns <= 6) >> + return MIN (4, nunroll); >> if (loop->ninsns <= 10) >> return MIN (2, nunroll); >> - else >> - return 0; >> + >> + return 0; >> } > > > ok > >> >> + if (rs6000_complex_loop_p (loop)) >> + return 0; >> + >> return nunroll; >> } >> > > ok