Re: [PATCH] Check calls before loop unrolling
On Fri, Nov 20, 2020 at 7:11 PM Segher Boessenkool wrote: > > Hi! > > On Fri, Nov 20, 2020 at 04:22:47PM +0100, Jan Hubicka wrote: > > As you know I spend quite some time on inliner heuristics but even after > > the years I have no clear idea how the requirements differs from x86-64 > > to ppc, arm and s390. Clearly compared to x86_64 prologues may get more > > expensive on ppc/arm because of more registers (so we should inline less > > to cold code) and function calls are more expensive (so we sould inline > > more to hot code). We do have PR for that in testusite where most of > > them I looked through. > > I made -fshrink-wrap-separate to make prologues less expensive for stuff > that is only used on the cold paths. This matters a lot -- and much > more could be done there, but that requires changing the generated code, > not just reordering it, so it is harder to do. > > Prologues (and epilogues) are only expensive if they are only needed for > cold code, in a hot function. > > > Problem is that each of us has different metodology - different > > bechmarks to look at > > This is a good thing often as well, it increases our total coverage. > But if not everything sees all results that also hurts :-/ > > > and different opinions on what is good for O2 and > > O3. > > Yeah. The documentation for -O3 merely says "Optimize yet more.", but > that is no guidance at all: why would a user ever use -O2 then? > > I always understood it as "-O2 is always faster than -O1, but -O3 is not > always faster than -O2". Aka "-O2 is always a good choice, and -O3 is a > an even better choice for *some* code, but that needs testing per case". So basically -O2 is supposed to be well-balanced in compile-time, code-size, performance and debuggability (if there's sth like that with optimized code...). -O1 is what you should use for machine-generated code, we kind-of promise to have no quadratic or worse algorithms in compile-time/memory-use here so you can throw a multi-gigabyte source function at GCC and it should not blow up. And -O1 still optimizes. -Os is when you want small code size at all cost (compile-time, less performance). -O3 is when you want performance at all cost (compile-time + code size and the ability to debug) So I'd always use -O2 unless doing a compute workload where I'd chose -O3 (maybe selective for the relevant TUs). Then there's profile-feedback and LTO which are enable them if you can (I'd avoid them for code you need -O1 for). It really helps GCC to make an appropriate decision what code to optimize more/less for the goal of the balanced profile which -O2 is. > In at least that understanding, and also to battle inflation in general, > we probably should move some things from -O3 to -O2. > > > From long term maintenace POV I am worried about changing a lot of > > --param defaults in different backends > > Me too. But changing a few key ones is just too important for > performance :-/ > > > simply becuase the meaning of > > those values keeps changing (as early opts improve; we get better on > > tracking optimizations during IPA passes; and our focus shift from C > > with sane inlines to basic C++ to heavy templatized C++ with many broken > > inline hints to heavy C++ with lto). > > I don't like if targets start to differ too much (in what generic passes > effectively do), no matter what. It's just not maintainable. > > > For this reason I tend to preffer to not tweak in taret specific ways > > unless there is very clear evidence to do so just because I think I will > > not be able to maintain code quality testing in future. > > Yes, completely agreed. But that exception is important :-) > > > It would be very interesting to set up testing that could let us compare > > basic arches side to side to different defaults. Our LNT testing does > > good job for x86-64 but we have basically zero coverage publically > > available on other targets and it is very hard to get inliner relevant > > banchmarks (where SPEC is not the best choice) done in comparable way on > > multiple arches. > > We cannot help with that on the cfarm, unless we get dedicated hardware > for such benchmarking (and I am not holding my breath for that, getting > good coverage at all is hard enough). So you probably need to get such > support for every arch separately, elsewhere :-/ > > > Segher
Re: [PATCH] Check calls before loop unrolling
Hi! On Fri, Nov 20, 2020 at 04:22:47PM +0100, Jan Hubicka wrote: > As you know I spend quite some time on inliner heuristics but even after > the years I have no clear idea how the requirements differs from x86-64 > to ppc, arm and s390. Clearly compared to x86_64 prologues may get more > expensive on ppc/arm because of more registers (so we should inline less > to cold code) and function calls are more expensive (so we sould inline > more to hot code). We do have PR for that in testusite where most of > them I looked through. I made -fshrink-wrap-separate to make prologues less expensive for stuff that is only used on the cold paths. This matters a lot -- and much more could be done there, but that requires changing the generated code, not just reordering it, so it is harder to do. Prologues (and epilogues) are only expensive if they are only needed for cold code, in a hot function. > Problem is that each of us has different metodology - different > bechmarks to look at This is a good thing often as well, it increases our total coverage. But if not everything sees all results that also hurts :-/ > and different opinions on what is good for O2 and > O3. Yeah. The documentation for -O3 merely says "Optimize yet more.", but that is no guidance at all: why would a user ever use -O2 then? I always understood it as "-O2 is always faster than -O1, but -O3 is not always faster than -O2". Aka "-O2 is always a good choice, and -O3 is a an even better choice for *some* code, but that needs testing per case". In at least that understanding, and also to battle inflation in general, we probably should move some things from -O3 to -O2. > From long term maintenace POV I am worried about changing a lot of > --param defaults in different backends Me too. But changing a few key ones is just too important for performance :-/ > simply becuase the meaning of > those values keeps changing (as early opts improve; we get better on > tracking optimizations during IPA passes; and our focus shift from C > with sane inlines to basic C++ to heavy templatized C++ with many broken > inline hints to heavy C++ with lto). I don't like if targets start to differ too much (in what generic passes effectively do), no matter what. It's just not maintainable. > For this reason I tend to preffer to not tweak in taret specific ways > unless there is very clear evidence to do so just because I think I will > not be able to maintain code quality testing in future. Yes, completely agreed. But that exception is important :-) > It would be very interesting to set up testing that could let us compare > basic arches side to side to different defaults. Our LNT testing does > good job for x86-64 but we have basically zero coverage publically > available on other targets and it is very hard to get inliner relevant > banchmarks (where SPEC is not the best choice) done in comparable way on > multiple arches. We cannot help with that on the cfarm, unless we get dedicated hardware for such benchmarking (and I am not holding my breath for that, getting good coverage at all is hard enough). So you probably need to get such support for every arch separately, elsewhere :-/ Segher
Re: [PATCH] Check calls before loop unrolling
> On Thu, Nov 19, 2020 at 03:30:37PM -0700, Jeff Law wrote: > > > No, the vast majority of people will *not* (consciously) use them, > > > because the target defaults will set things to useful values. > > > > > > The compiler could use saner "generic" defaults perhaps, but those will > > > still not be satisfactory for anyone (except when they aren't generic in > > > fact but instead tuned for one arch ;-) ) -- unrolling is just too > > > important for performance. > > Then fix the heuristics, don't add new PARAMS :-) > > I just said that cannot work? > > > It didn't even occur to me until now that you may be pushing to have the > > ppc backend have different values for the PARAMS. I would strongly > > discourage that. It's been a huge headache in the s390 backend already. > > It also makes a huge performance difference. That the generic parts > of GCC are only tuned for x86 (or not well tuned for anything?) is a > huge roadblock for us. As you know I spend quite some time on inliner heuristics but even after the years I have no clear idea how the requirements differs from x86-64 to ppc, arm and s390. Clearly compared to x86_64 prologues may get more expensive on ppc/arm because of more registers (so we should inline less to cold code) and function calls are more expensive (so we sould inline more to hot code). We do have PR for that in testusite where most of them I looked through. Problem is that each of us has different metodology - different bechmarks to look at and different opinions on what is good for O2 and O3. From long term maintenace POV I am worried about changing a lot of --param defaults in different backends simply becuase the meaning of those values keeps changing (as early opts improve; we get better on tracking optimizations during IPA passes; and our focus shift from C with sane inlines to basic C++ to heavy templatized C++ with many broken inline hints to heavy C++ with lto). For this reason I tend to preffer to not tweak in taret specific ways unless there is very clear evidence to do so just because I think I will not be able to maintain code quality testing in future. It would be very interesting to set up testing that could let us compare basic arches side to side to different defaults. Our LNT testing does good job for x86-64 but we have basically zero coverage publically available on other targets and it is very hard to get inliner relevant banchmarks (where SPEC is not the best choice) done in comparable way on multiple arches. Honza
Re: [PATCH] Check calls before loop unrolling
On Fri, Nov 20, 2020 at 2:48 AM Richard Biener wrote: > > On Fri, Nov 20, 2020 at 12:58 AM Segher Boessenkool > wrote: > > > > On Thu, Nov 19, 2020 at 03:30:37PM -0700, Jeff Law wrote: > > > > No, the vast majority of people will *not* (consciously) use them, > > > > because the target defaults will set things to useful values. > > > > > > > > The compiler could use saner "generic" defaults perhaps, but those will > > > > still not be satisfactory for anyone (except when they aren't generic in > > > > fact but instead tuned for one arch ;-) ) -- unrolling is just too > > > > important for performance. > > > Then fix the heuristics, don't add new PARAMS :-) > > > > I just said that cannot work? > > > > > It didn't even occur to me until now that you may be pushing to have the > > > ppc backend have different values for the PARAMS. I would strongly > > > discourage that. It's been a huge headache in the s390 backend already. > > > > It also makes a huge performance difference. That the generic parts > > of GCC are only tuned for x86 (or not well tuned for anything?) is a > > huge roadblock for us. > > > > I am not saying we should have six hundred different tunings. But we > > need a few (and we already *have* a few, not params but generic flags, > > just like many other targets fwiw). > > > > We *do* have a few custom param settings already, just like aarch64, > > ia64, and sh, actually. > > > > > >> In my mind fixing things so they work with no magic arguments is best. > > > >> PARAMS are the worst solution. A -f flag with no arguments is > > > >> somewhere > > > >> in between. Others may clearly have different opinions here. > > > > There is no big difference between params and flags here, IMO -- it has > > > > to be a -f with a value as well, for good results. > > > Which is a signal that we have a deeper problem. -f with a value is no > > > different than a param. > > > > Yes exactly. > > > > > > Since we have (almost) all such tunings in --param already, I'd say this > > > > one belongs there as well? > > > I'm not convinced at this point. > > > > Why not? > > > > We have way many params, yes. > > --params were introduced to avoid "magic numbers" in code and at the > same time not overwhelm users with many -f options. That they are > runtime-controllable was probably done because we could and because > it's nice for GCC developers. GCC historically has not done a good job at loop unrolling. And tuning of loop unrolling is inherently architecture- and microarchitecture-specific. There is no "better heuristic". There is nothing inherent in the instruction stream to determine an optimal unrolling that is correct for x86 and AArch64 and RISC-V and s390x and Power. Based on what academic literature or experience of other compilers have you determined that this limitation can be addressed with "fix the heuristics"? The patch *IS* trying to fix the heuristics. The heuristics require additional, processor-specific information. And a parameter is the natural mechanism in GCC to provide a numerical value to adjust a heuristic. As Richard wrote, the GCC community chose to collect the "magic numbers" in a centralized table with a consistent interface that can be overridden by individual ports. The ability to override the parameters on the command line was for convenience and ease of development. It's not meant as a value that any end-user normally will adjust. But there is no reason to arbitrarily start a campaign against parameters. GCC supports a large number of targets, and that requires the ability to adjust the optimization and transformation behavior of the compiler to achieve the best performance on a wide variety of processors. We would appreciate it if you would not block a patch that improves the code generation of GCC on Power (and other targets) because of an aesthetic concern about too many parameters in GCC. That ship has sailed. Thanks, David > > > But the first step to counteract that > > would be to deprecate and get rid of many existing ones, not to block > > having new ones which can be useful (while many of the existing ones are > > not). > > Not sure about this - sure, if heuristic can be simplified to use N < M > (previous) "magic" numbers that's better. But if "deprecating" just > involves pasting the current --param default literally into the heuristcs > then no, please not. > > For this particular patch the question is if the heuristic is sound, > not the particular magic number. And I have no opinion about this > (being this is the RTL unroller). > > Richard. > > > > > Or, we could accept that it is not really a problem at all. You seem to > > have a strong opinion that it *is*, but I don't understand that; maybe > > you can explain a bit more? > > > > Thanks, > > > > > > Segher
Re: [PATCH] Check calls before loop unrolling
On Fri, Nov 20, 2020 at 12:58 AM Segher Boessenkool wrote: > > On Thu, Nov 19, 2020 at 03:30:37PM -0700, Jeff Law wrote: > > > No, the vast majority of people will *not* (consciously) use them, > > > because the target defaults will set things to useful values. > > > > > > The compiler could use saner "generic" defaults perhaps, but those will > > > still not be satisfactory for anyone (except when they aren't generic in > > > fact but instead tuned for one arch ;-) ) -- unrolling is just too > > > important for performance. > > Then fix the heuristics, don't add new PARAMS :-) > > I just said that cannot work? > > > It didn't even occur to me until now that you may be pushing to have the > > ppc backend have different values for the PARAMS. I would strongly > > discourage that. It's been a huge headache in the s390 backend already. > > It also makes a huge performance difference. That the generic parts > of GCC are only tuned for x86 (or not well tuned for anything?) is a > huge roadblock for us. > > I am not saying we should have six hundred different tunings. But we > need a few (and we already *have* a few, not params but generic flags, > just like many other targets fwiw). > > We *do* have a few custom param settings already, just like aarch64, > ia64, and sh, actually. > > > >> In my mind fixing things so they work with no magic arguments is best. > > >> PARAMS are the worst solution. A -f flag with no arguments is somewhere > > >> in between. Others may clearly have different opinions here. > > > There is no big difference between params and flags here, IMO -- it has > > > to be a -f with a value as well, for good results. > > Which is a signal that we have a deeper problem. -f with a value is no > > different than a param. > > Yes exactly. > > > > Since we have (almost) all such tunings in --param already, I'd say this > > > one belongs there as well? > > I'm not convinced at this point. > > Why not? > > We have way many params, yes. --params were introduced to avoid "magic numbers" in code and at the same time not overwhelm users with many -f options. That they are runtime-controllable was probably done because we could and because it's nice for GCC developers. > But the first step to counteract that > would be to deprecate and get rid of many existing ones, not to block > having new ones which can be useful (while many of the existing ones are > not). Not sure about this - sure, if heuristic can be simplified to use N < M (previous) "magic" numbers that's better. But if "deprecating" just involves pasting the current --param default literally into the heuristcs then no, please not. For this particular patch the question is if the heuristic is sound, not the particular magic number. And I have no opinion about this (being this is the RTL unroller). Richard. > > Or, we could accept that it is not really a problem at all. You seem to > have a strong opinion that it *is*, but I don't understand that; maybe > you can explain a bit more? > > Thanks, > > > Segher
Re: [PATCH] Check calls before loop unrolling
On Thu, Nov 19, 2020 at 03:30:37PM -0700, Jeff Law wrote: > > No, the vast majority of people will *not* (consciously) use them, > > because the target defaults will set things to useful values. > > > > The compiler could use saner "generic" defaults perhaps, but those will > > still not be satisfactory for anyone (except when they aren't generic in > > fact but instead tuned for one arch ;-) ) -- unrolling is just too > > important for performance. > Then fix the heuristics, don't add new PARAMS :-) I just said that cannot work? > It didn't even occur to me until now that you may be pushing to have the > ppc backend have different values for the PARAMS. I would strongly > discourage that. It's been a huge headache in the s390 backend already. It also makes a huge performance difference. That the generic parts of GCC are only tuned for x86 (or not well tuned for anything?) is a huge roadblock for us. I am not saying we should have six hundred different tunings. But we need a few (and we already *have* a few, not params but generic flags, just like many other targets fwiw). We *do* have a few custom param settings already, just like aarch64, ia64, and sh, actually. > >> In my mind fixing things so they work with no magic arguments is best. > >> PARAMS are the worst solution. A -f flag with no arguments is somewhere > >> in between. Others may clearly have different opinions here. > > There is no big difference between params and flags here, IMO -- it has > > to be a -f with a value as well, for good results. > Which is a signal that we have a deeper problem. -f with a value is no > different than a param. Yes exactly. > > Since we have (almost) all such tunings in --param already, I'd say this > > one belongs there as well? > I'm not convinced at this point. Why not? We have way many params, yes. But the first step to counteract that would be to deprecate and get rid of many existing ones, not to block having new ones which can be useful (while many of the existing ones are not). Or, we could accept that it is not really a problem at all. You seem to have a strong opinion that it *is*, but I don't understand that; maybe you can explain a bit more? Thanks, Segher
Re: [PATCH] Check calls before loop unrolling
On 11/19/20 1:01 PM, Segher Boessenkool wrote: > On Thu, Nov 19, 2020 at 12:53:27PM -0700, Jeff Law wrote: >> On 11/19/20 12:42 PM, Segher Boessenkool wrote: >>> On Thu, Nov 19, 2020 at 12:13:34PM -0700, Jeff Law wrote: On 8/31/20 9:33 PM, Jiufu Guo via Gcc-patches wrote: > guojiufu writes: >> When unroll loops, if there are calls inside the loop, those calls >> may raise negative impacts for unrolling. This patch adds a param >> param_max_unrolled_calls, and checks if the number of calls inside >> the loop bigger than this param, loop is prevent from unrolling. >> >> This patch is checking the _average_ number of calls which is the >> summary of call numbers multiply the possibility of the call maybe >> executed. The _average_ number could be a fraction, to keep the >> precision, the param is the threshold number multiply 1. >> >> Bootstrap and regtest pass on powerpc64le. Is this ok for trunk? >> >> gcc/ChangeLog >> 2020-08-19 Jiufu Guo >> >> * params.opt (param_max_unrolled_average_calls_x1): New param. >> * cfgloop.h (average_num_loop_calls): New declare. >> * cfgloopanal.c (average_num_loop_calls): New function. >> * loop-unroll.c (decide_unroll_constant_iteration, >> decide_unroll_runtime_iterations, >> decide_unroll_stupid): Check average_num_loop_calls and >> param_max_unrolled_average_calls_x1. So what's the motivation behind adding a PARAM to control this behavior? I'm not a big fan of exposing a lot of PARAMs for users to tune behavior (though I've made the same lapse in judgment myself). In my mind a PARAM is really more about controlling pathological behavior. >>> But we (Power) need very different tuning than what others apparently >>> need. It is similar to inlining, in that that also differs a lot >>> between archs how aggressively to do that optimally. >> But what I think that argues is that we've got a gap in the costing >> model and/or how its being used. Throwing PARAMS at the problem isn't >> really useful for the end user. The vast majority aren't going to use >> them and of the ones that do, most are probably going to get it wrong. > No, the vast majority of people will *not* (consciously) use them, > because the target defaults will set things to useful values. > > The compiler could use saner "generic" defaults perhaps, but those will > still not be satisfactory for anyone (except when they aren't generic in > fact but instead tuned for one arch ;-) ) -- unrolling is just too > important for performance. Then fix the heuristics, don't add new PARAMS :-) It didn't even occur to me until now that you may be pushing to have the ppc backend have different values for the PARAMS. I would strongly discourage that. It's been a huge headache in the s390 backend already. > >> In my mind fixing things so they work with no magic arguments is best. >> PARAMS are the worst solution. A -f flag with no arguments is somewhere >> in between. Others may clearly have different opinions here. > There is no big difference between params and flags here, IMO -- it has > to be a -f with a value as well, for good results. Which is a signal that we have a deeper problem. -f with a value is no different than a param. > > Since we have (almost) all such tunings in --param already, I'd say this > one belongs there as well? I'm not convinced at this point. jeff
Re: [PATCH] Check calls before loop unrolling
On Thu, Nov 19, 2020 at 12:53:27PM -0700, Jeff Law wrote: > On 11/19/20 12:42 PM, Segher Boessenkool wrote: > > On Thu, Nov 19, 2020 at 12:13:34PM -0700, Jeff Law wrote: > >> On 8/31/20 9:33 PM, Jiufu Guo via Gcc-patches wrote: > >>> guojiufu writes: > When unroll loops, if there are calls inside the loop, those calls > may raise negative impacts for unrolling. This patch adds a param > param_max_unrolled_calls, and checks if the number of calls inside > the loop bigger than this param, loop is prevent from unrolling. > > This patch is checking the _average_ number of calls which is the > summary of call numbers multiply the possibility of the call maybe > executed. The _average_ number could be a fraction, to keep the > precision, the param is the threshold number multiply 1. > > Bootstrap and regtest pass on powerpc64le. Is this ok for trunk? > > gcc/ChangeLog > 2020-08-19 Jiufu Guo > > * params.opt (param_max_unrolled_average_calls_x1): New param. > * cfgloop.h (average_num_loop_calls): New declare. > * cfgloopanal.c (average_num_loop_calls): New function. > * loop-unroll.c (decide_unroll_constant_iteration, > decide_unroll_runtime_iterations, > decide_unroll_stupid): Check average_num_loop_calls and > param_max_unrolled_average_calls_x1. > >> So what's the motivation behind adding a PARAM to control this > >> behavior? I'm not a big fan of exposing a lot of PARAMs for users to > >> tune behavior (though I've made the same lapse in judgment myself). In > >> my mind a PARAM is really more about controlling pathological behavior. > > But we (Power) need very different tuning than what others apparently > > need. It is similar to inlining, in that that also differs a lot > > between archs how aggressively to do that optimally. > But what I think that argues is that we've got a gap in the costing > model and/or how its being used. Throwing PARAMS at the problem isn't > really useful for the end user. The vast majority aren't going to use > them and of the ones that do, most are probably going to get it wrong. No, the vast majority of people will *not* (consciously) use them, because the target defaults will set things to useful values. The compiler could use saner "generic" defaults perhaps, but those will still not be satisfactory for anyone (except when they aren't generic in fact but instead tuned for one arch ;-) ) -- unrolling is just too important for performance. > In my mind fixing things so they work with no magic arguments is best. > PARAMS are the worst solution. A -f flag with no arguments is somewhere > in between. Others may clearly have different opinions here. There is no big difference between params and flags here, IMO -- it has to be a -f with a value as well, for good results. Since we have (almost) all such tunings in --param already, I'd say this one belongs there as well? Segher
Re: [PATCH] Check calls before loop unrolling
On 11/19/20 12:42 PM, Segher Boessenkool wrote: > On Thu, Nov 19, 2020 at 12:13:34PM -0700, Jeff Law wrote: >> On 8/31/20 9:33 PM, Jiufu Guo via Gcc-patches wrote: >>> guojiufu writes: When unroll loops, if there are calls inside the loop, those calls may raise negative impacts for unrolling. This patch adds a param param_max_unrolled_calls, and checks if the number of calls inside the loop bigger than this param, loop is prevent from unrolling. This patch is checking the _average_ number of calls which is the summary of call numbers multiply the possibility of the call maybe executed. The _average_ number could be a fraction, to keep the precision, the param is the threshold number multiply 1. Bootstrap and regtest pass on powerpc64le. Is this ok for trunk? gcc/ChangeLog 2020-08-19 Jiufu Guo * params.opt (param_max_unrolled_average_calls_x1): New param. * cfgloop.h (average_num_loop_calls): New declare. * cfgloopanal.c (average_num_loop_calls): New function. * loop-unroll.c (decide_unroll_constant_iteration, decide_unroll_runtime_iterations, decide_unroll_stupid): Check average_num_loop_calls and param_max_unrolled_average_calls_x1. >> So what's the motivation behind adding a PARAM to control this >> behavior? I'm not a big fan of exposing a lot of PARAMs for users to >> tune behavior (though I've made the same lapse in judgment myself). In >> my mind a PARAM is really more about controlling pathological behavior. > But we (Power) need very different tuning than what others apparently > need. It is similar to inlining, in that that also differs a lot > between archs how aggressively to do that optimally. But what I think that argues is that we've got a gap in the costing model and/or how its being used. Throwing PARAMS at the problem isn't really useful for the end user. The vast majority aren't going to use them and of the ones that do, most are probably going to get it wrong. In my mind fixing things so they work with no magic arguments is best. PARAMS are the worst solution. A -f flag with no arguments is somewhere in between. Others may clearly have different opinions here. jeff
Re: [PATCH] Check calls before loop unrolling
On Thu, Nov 19, 2020 at 12:13:34PM -0700, Jeff Law wrote: > On 8/31/20 9:33 PM, Jiufu Guo via Gcc-patches wrote: > > guojiufu writes: > >> When unroll loops, if there are calls inside the loop, those calls > >> may raise negative impacts for unrolling. This patch adds a param > >> param_max_unrolled_calls, and checks if the number of calls inside > >> the loop bigger than this param, loop is prevent from unrolling. > >> > >> This patch is checking the _average_ number of calls which is the > >> summary of call numbers multiply the possibility of the call maybe > >> executed. The _average_ number could be a fraction, to keep the > >> precision, the param is the threshold number multiply 1. > >> > >> Bootstrap and regtest pass on powerpc64le. Is this ok for trunk? > >> > >> gcc/ChangeLog > >> 2020-08-19 Jiufu Guo > >> > >>* params.opt (param_max_unrolled_average_calls_x1): New param. > >>* cfgloop.h (average_num_loop_calls): New declare. > >>* cfgloopanal.c (average_num_loop_calls): New function. > >>* loop-unroll.c (decide_unroll_constant_iteration, > >>decide_unroll_runtime_iterations, > >>decide_unroll_stupid): Check average_num_loop_calls and > >>param_max_unrolled_average_calls_x1. > So what's the motivation behind adding a PARAM to control this > behavior? I'm not a big fan of exposing a lot of PARAMs for users to > tune behavior (though I've made the same lapse in judgment myself). In > my mind a PARAM is really more about controlling pathological behavior. But we (Power) need very different tuning than what others apparently need. It is similar to inlining, in that that also differs a lot between archs how aggressively to do that optimally. Segher
Re: [PATCH] Check calls before loop unrolling
On 8/31/20 9:33 PM, Jiufu Guo via Gcc-patches wrote: > guojiufu writes: > > Hi, > > In this patch, the default value of > param=max-unrolled-average-calls-x1 is '0', which means to unroll > a loop, there should be no call inside the body. Do I need to set the > default value to a bigger value (16?) for later tune? Biger value will > keep the behavior unchanged. > > And is this patch ok for trunk? Thanks a lot for you comments! > > BR. > Jiufu. > > >> Hi, >> >> When unroll loops, if there are calls inside the loop, those calls >> may raise negative impacts for unrolling. This patch adds a param >> param_max_unrolled_calls, and checks if the number of calls inside >> the loop bigger than this param, loop is prevent from unrolling. >> >> This patch is checking the _average_ number of calls which is the >> summary of call numbers multiply the possibility of the call maybe >> executed. The _average_ number could be a fraction, to keep the >> precision, the param is the threshold number multiply 1. >> >> Bootstrap and regtest pass on powerpc64le. Is this ok for trunk? >> >> gcc/ChangeLog >> 2020-08-19 Jiufu Guo >> >> * params.opt (param_max_unrolled_average_calls_x1): New param. >> * cfgloop.h (average_num_loop_calls): New declare. >> * cfgloopanal.c (average_num_loop_calls): New function. >> * loop-unroll.c (decide_unroll_constant_iteration, >> decide_unroll_runtime_iterations, >> decide_unroll_stupid): Check average_num_loop_calls and >> param_max_unrolled_average_calls_x1. So what's the motivation behind adding a PARAM to control this behavior? I'm not a big fan of exposing a lot of PARAMs for users to tune behavior (though I've made the same lapse in judgment myself). In my mind a PARAM is really more about controlling pathological behavior. jeff
Re: [PATCH] Check calls before loop unrolling
Hi all, This patch sets the default value to 16 for parameter max_unrolled_average_calls which could be used to restict calls in loop when unrolling. This default value(16) is a big number which keeps current behavior for almost all cases. Bootstrap and regtest pass on powerpc64le. Is this ok for trunk? Thanks for comments! Jiufu Guo gcc/ChangeLog 2020-09-16 Jiufu Guo * params.opt (param_max_unrolled_average_calls_x1): New param. * cfgloop.h (average_num_loop_calls): New declare. * cfgloopanal.c (average_num_loop_calls): New function. * loop-unroll.c (decide_unroll_constant_iteration, decide_unroll_runtime_iterations, decide_unroll_stupid): Check average_num_loop_calls and param_max_unrolled_average_calls_x1. --- gcc/cfgloop.h | 2 ++ gcc/cfgloopanal.c | 25 + gcc/loop-unroll.c | 10 ++ gcc/params.opt| 4 4 files changed, 41 insertions(+) diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h index 18b404e292f..dab933da150 100644 --- a/gcc/cfgloop.h +++ b/gcc/cfgloop.h @@ -21,6 +21,7 @@ along with GCC; see the file COPYING3. If not see #define GCC_CFGLOOP_H #include "cfgloopmanip.h" +#include "sreal.h" /* Structure to hold decision about unrolling/peeling. */ enum lpt_dec @@ -387,6 +388,7 @@ extern vec get_loop_exit_edges (const class loop *, basic_block * = NULL); extern edge single_exit (const class loop *); extern edge single_likely_exit (class loop *loop, vec); extern unsigned num_loop_branches (const class loop *); +extern sreal average_num_loop_calls (const class loop *); extern edge loop_preheader_edge (const class loop *); extern edge loop_latch_edge (const class loop *); diff --git a/gcc/cfgloopanal.c b/gcc/cfgloopanal.c index 0b33e8272a7..a314db4e0c0 100644 --- a/gcc/cfgloopanal.c +++ b/gcc/cfgloopanal.c @@ -233,6 +233,31 @@ average_num_loop_insns (const class loop *loop) return ret; } +/* Count the number of call insns in LOOP. */ +sreal +average_num_loop_calls (const class loop *loop) +{ + basic_block *bbs; + rtx_insn *insn; + unsigned int i, bncalls; + sreal ncalls = 0; + + bbs = get_loop_body (loop); + for (i = 0; i < loop->num_nodes; i++) +{ + bncalls = 0; + FOR_BB_INSNS (bbs[i], insn) + if (CALL_P (insn)) + bncalls++; + + ncalls += (sreal) bncalls + * bbs[i]->count.to_sreal_scale (loop->header->count); +} + free (bbs); + + return ncalls; +} + /* Returns expected number of iterations of LOOP, according to measured or guessed profile. diff --git a/gcc/loop-unroll.c b/gcc/loop-unroll.c index 693c7768868..56b8fb37d2a 100644 --- a/gcc/loop-unroll.c +++ b/gcc/loop-unroll.c @@ -370,6 +370,10 @@ decide_unroll_constant_iterations (class loop *loop, int flags) nunroll = nunroll_by_av; if (nunroll > (unsigned) param_max_unroll_times) nunroll = param_max_unroll_times; + if (!loop->unroll + && (average_num_loop_calls (loop) * (sreal) 1).to_int () + > (unsigned) param_max_unrolled_average_calls_x1) +nunroll = 0; if (targetm.loop_unroll_adjust) nunroll = targetm.loop_unroll_adjust (nunroll, loop); @@ -689,6 +693,9 @@ decide_unroll_runtime_iterations (class loop *loop, int flags) nunroll = nunroll_by_av; if (nunroll > (unsigned) param_max_unroll_times) nunroll = param_max_unroll_times; + if ((average_num_loop_calls (loop) * (sreal) 1).to_int () + > (unsigned) param_max_unrolled_average_calls_x1) +nunroll = 0; if (targetm.loop_unroll_adjust) nunroll = targetm.loop_unroll_adjust (nunroll, loop); @@ -1173,6 +1180,9 @@ decide_unroll_stupid (class loop *loop, int flags) nunroll = nunroll_by_av; if (nunroll > (unsigned) param_max_unroll_times) nunroll = param_max_unroll_times; + if ((average_num_loop_calls (loop) * (sreal) 1).to_int () + > (unsigned) param_max_unrolled_average_calls_x1) +nunroll = 0; if (targetm.loop_unroll_adjust) nunroll = targetm.loop_unroll_adjust (nunroll, loop); diff --git a/gcc/params.opt b/gcc/params.opt index f39e5d1a012..80605861223 100644 --- a/gcc/params.opt +++ b/gcc/params.opt @@ -634,6 +634,10 @@ The maximum number of unrollings of a single loop. Common Joined UInteger Var(param_max_unrolled_insns) Init(200) Param Optimization The maximum number of instructions to consider to unroll in a loop. +-param=max-unrolled-average-calls-x1= +Common Joined UInteger Var(param_max_unrolled_average_calls_x1) Init(16) Param Optimization +The maximum number of calls to consider to unroll in a loop on average and multiply 1. + -param=max-unswitch-insns= Common Joined UInteger Var(param_max_unswitch_insns) Init(50) Param Optimization The maximum number of insns of an unswitched loop. -- 2.25.1 Jan Hubicka writes: >> On Thu, Aug 20, 2020 at 6:35 AM guojiufu via Gcc-patches >> wrote: >> > >> > Hi, >> > >> > When unroll loops, if there are cal
Re: [PATCH] Check calls before loop unrolling
guojiufu writes: Hi, In this patch, the default value of param=max-unrolled-average-calls-x1 is '0', which means to unroll a loop, there should be no call inside the body. Do I need to set the default value to a bigger value (16?) for later tune? Biger value will keep the behavior unchanged. And is this patch ok for trunk? Thanks a lot for you comments! BR. Jiufu. > Hi, > > When unroll loops, if there are calls inside the loop, those calls > may raise negative impacts for unrolling. This patch adds a param > param_max_unrolled_calls, and checks if the number of calls inside > the loop bigger than this param, loop is prevent from unrolling. > > This patch is checking the _average_ number of calls which is the > summary of call numbers multiply the possibility of the call maybe > executed. The _average_ number could be a fraction, to keep the > precision, the param is the threshold number multiply 1. > > Bootstrap and regtest pass on powerpc64le. Is this ok for trunk? > > gcc/ChangeLog > 2020-08-19 Jiufu Guo > > * params.opt (param_max_unrolled_average_calls_x1): New param. > * cfgloop.h (average_num_loop_calls): New declare. > * cfgloopanal.c (average_num_loop_calls): New function. > * loop-unroll.c (decide_unroll_constant_iteration, > decide_unroll_runtime_iterations, > decide_unroll_stupid): Check average_num_loop_calls and > param_max_unrolled_average_calls_x1. > --- > gcc/cfgloop.h | 2 ++ > gcc/cfgloopanal.c | 25 + > gcc/loop-unroll.c | 10 ++ > gcc/params.opt| 4 > 4 files changed, 41 insertions(+) > > diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h > index 18b404e292f..dab933da150 100644 > --- a/gcc/cfgloop.h > +++ b/gcc/cfgloop.h > @@ -21,6 +21,7 @@ along with GCC; see the file COPYING3. If not see > #define GCC_CFGLOOP_H > > #include "cfgloopmanip.h" > +#include "sreal.h" > > /* Structure to hold decision about unrolling/peeling. */ > enum lpt_dec > @@ -387,6 +388,7 @@ extern vec get_loop_exit_edges (const class loop *, > basic_block * = NULL); > extern edge single_exit (const class loop *); > extern edge single_likely_exit (class loop *loop, vec); > extern unsigned num_loop_branches (const class loop *); > +extern sreal average_num_loop_calls (const class loop *); > > extern edge loop_preheader_edge (const class loop *); > extern edge loop_latch_edge (const class loop *); > diff --git a/gcc/cfgloopanal.c b/gcc/cfgloopanal.c > index 0b33e8272a7..a314db4e0c0 100644 > --- a/gcc/cfgloopanal.c > +++ b/gcc/cfgloopanal.c > @@ -233,6 +233,31 @@ average_num_loop_insns (const class loop *loop) >return ret; > } > > +/* Count the number of call insns in LOOP. */ > +sreal > +average_num_loop_calls (const class loop *loop) > +{ > + basic_block *bbs; > + rtx_insn *insn; > + unsigned int i, bncalls; > + sreal ncalls = 0; > + > + bbs = get_loop_body (loop); > + for (i = 0; i < loop->num_nodes; i++) > +{ > + bncalls = 0; > + FOR_BB_INSNS (bbs[i], insn) > + if (CALL_P (insn)) > + bncalls++; > + > + ncalls += (sreal) bncalls > + * bbs[i]->count.to_sreal_scale (loop->header->count); > +} > + free (bbs); > + > + return ncalls; > +} > + > /* Returns expected number of iterations of LOOP, according to > measured or guessed profile. > > diff --git a/gcc/loop-unroll.c b/gcc/loop-unroll.c > index 693c7768868..56b8fb37d2a 100644 > --- a/gcc/loop-unroll.c > +++ b/gcc/loop-unroll.c > @@ -370,6 +370,10 @@ decide_unroll_constant_iterations (class loop *loop, int > flags) > nunroll = nunroll_by_av; >if (nunroll > (unsigned) param_max_unroll_times) > nunroll = param_max_unroll_times; > + if (!loop->unroll > + && (average_num_loop_calls (loop) * (sreal) 1).to_int () > +> (unsigned) param_max_unrolled_average_calls_x1) > +nunroll = 0; > >if (targetm.loop_unroll_adjust) > nunroll = targetm.loop_unroll_adjust (nunroll, loop); > @@ -689,6 +693,9 @@ decide_unroll_runtime_iterations (class loop *loop, int > flags) > nunroll = nunroll_by_av; >if (nunroll > (unsigned) param_max_unroll_times) > nunroll = param_max_unroll_times; > + if ((average_num_loop_calls (loop) * (sreal) 1).to_int () > + > (unsigned) param_max_unrolled_average_calls_x1) > +nunroll = 0; > >if (targetm.loop_unroll_adjust) > nunroll = targetm.loop_unroll_adjust (nunroll, loop); > @@ -1173,6 +1180,9 @@ decide_unroll_stupid (class loop *loop, int flags) > nunroll = nunroll_by_av; >if (nunroll > (unsigned) param_max_unroll_times) > nunroll = param_max_unroll_times; > + if ((average_num_loop_calls (loop) * (sreal) 1).to_int () > + > (unsigned) param_max_unrolled_average_calls_x1) > +nunroll = 0; > >if (targetm.loop_unroll_adjust) > nunroll = targetm.loop_unroll_adjust (nunroll, loop); > diff --git a/gcc/params.opt b/gcc/params.opt > index f39e5d1
Re: [PATCH] Check calls before loop unrolling
On 2020-08-24 19:16, Jan Hubicka wrote: On Thu, Aug 20, 2020 at 6:35 AM guojiufu via Gcc-patches wrote: > > Hi, > > This patch is checking the _average_ number of calls which is the > summary of call numbers multiply the possibility of the call maybe > executed. The _average_ number could be a fraction, to keep the > precision, the param is the threshold number multiply 1. > Can you try mimicking what try_unroll_loop_completely on GIMPLE does instead? IIRC the main motivation to not unroll calls is the spilling code around it which we cannot estimate very well. And that spilling happens irrespective of whether the call is in a hot or cold path so I'm not sure it makes sense to use the "average" number of calls here. In try_unroll_loop_completely, it is checking the calls in the hot path: num_non_pure_calls_on_hot_path), and avoid unrolling if there is. This is one reason for here to use "average". As long as I remember, we excluded calls simply becuase it is/was an expensive intruction so it was an indication that the loop overhead is small compared to the overhead of loop body. Thanks, Honza and Richard! Honza
Re: [PATCH] Check calls before loop unrolling
> On Thu, Aug 20, 2020 at 6:35 AM guojiufu via Gcc-patches > wrote: > > > > Hi, > > > > When unroll loops, if there are calls inside the loop, those calls > > may raise negative impacts for unrolling. This patch adds a param > > param_max_unrolled_calls, and checks if the number of calls inside > > the loop bigger than this param, loop is prevent from unrolling. > > > > This patch is checking the _average_ number of calls which is the > > summary of call numbers multiply the possibility of the call maybe > > executed. The _average_ number could be a fraction, to keep the > > precision, the param is the threshold number multiply 1. > > > > Bootstrap and regtest pass on powerpc64le. Is this ok for trunk? > > Can you try mimicking what try_unroll_loop_completely on GIMPLE does > instead? IIRC the main motivation to not unroll calls is the spilling code > around it which we cannot estimate very well. And that spilling happens > irrespective of whether the call is in a hot or cold path so I'm not sure > it makes sense to use the "average" number of calls here. As long as I remember, we excluded calls simply becuase it is/was an expensive intruction so it was an indication that the loop overhead is small compared to the overhead of loop body. Honza
Re: [PATCH] Check calls before loop unrolling
On Thu, Aug 20, 2020 at 6:35 AM guojiufu via Gcc-patches wrote: > > Hi, > > When unroll loops, if there are calls inside the loop, those calls > may raise negative impacts for unrolling. This patch adds a param > param_max_unrolled_calls, and checks if the number of calls inside > the loop bigger than this param, loop is prevent from unrolling. > > This patch is checking the _average_ number of calls which is the > summary of call numbers multiply the possibility of the call maybe > executed. The _average_ number could be a fraction, to keep the > precision, the param is the threshold number multiply 1. > > Bootstrap and regtest pass on powerpc64le. Is this ok for trunk? Can you try mimicking what try_unroll_loop_completely on GIMPLE does instead? IIRC the main motivation to not unroll calls is the spilling code around it which we cannot estimate very well. And that spilling happens irrespective of whether the call is in a hot or cold path so I'm not sure it makes sense to use the "average" number of calls here. Richard. > gcc/ChangeLog > 2020-08-19 Jiufu Guo > > * params.opt (param_max_unrolled_average_calls_x1): New param. > * cfgloop.h (average_num_loop_calls): New declare. > * cfgloopanal.c (average_num_loop_calls): New function. > * loop-unroll.c (decide_unroll_constant_iteration, > decide_unroll_runtime_iterations, > decide_unroll_stupid): Check average_num_loop_calls and > param_max_unrolled_average_calls_x1. > --- > gcc/cfgloop.h | 2 ++ > gcc/cfgloopanal.c | 25 + > gcc/loop-unroll.c | 10 ++ > gcc/params.opt| 4 > 4 files changed, 41 insertions(+) > > diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h > index 18b404e292f..dab933da150 100644 > --- a/gcc/cfgloop.h > +++ b/gcc/cfgloop.h > @@ -21,6 +21,7 @@ along with GCC; see the file COPYING3. If not see > #define GCC_CFGLOOP_H > > #include "cfgloopmanip.h" > +#include "sreal.h" > > /* Structure to hold decision about unrolling/peeling. */ > enum lpt_dec > @@ -387,6 +388,7 @@ extern vec get_loop_exit_edges (const class loop *, > basic_block * = NULL); > extern edge single_exit (const class loop *); > extern edge single_likely_exit (class loop *loop, vec); > extern unsigned num_loop_branches (const class loop *); > +extern sreal average_num_loop_calls (const class loop *); > > extern edge loop_preheader_edge (const class loop *); > extern edge loop_latch_edge (const class loop *); > diff --git a/gcc/cfgloopanal.c b/gcc/cfgloopanal.c > index 0b33e8272a7..a314db4e0c0 100644 > --- a/gcc/cfgloopanal.c > +++ b/gcc/cfgloopanal.c > @@ -233,6 +233,31 @@ average_num_loop_insns (const class loop *loop) >return ret; > } > > +/* Count the number of call insns in LOOP. */ > +sreal > +average_num_loop_calls (const class loop *loop) > +{ > + basic_block *bbs; > + rtx_insn *insn; > + unsigned int i, bncalls; > + sreal ncalls = 0; > + > + bbs = get_loop_body (loop); > + for (i = 0; i < loop->num_nodes; i++) > +{ > + bncalls = 0; > + FOR_BB_INSNS (bbs[i], insn) > + if (CALL_P (insn)) > + bncalls++; > + > + ncalls += (sreal) bncalls > + * bbs[i]->count.to_sreal_scale (loop->header->count); > +} > + free (bbs); > + > + return ncalls; > +} > + > /* Returns expected number of iterations of LOOP, according to > measured or guessed profile. > > diff --git a/gcc/loop-unroll.c b/gcc/loop-unroll.c > index 693c7768868..56b8fb37d2a 100644 > --- a/gcc/loop-unroll.c > +++ b/gcc/loop-unroll.c > @@ -370,6 +370,10 @@ decide_unroll_constant_iterations (class loop *loop, int > flags) > nunroll = nunroll_by_av; >if (nunroll > (unsigned) param_max_unroll_times) > nunroll = param_max_unroll_times; > + if (!loop->unroll > + && (average_num_loop_calls (loop) * (sreal) 1).to_int () > + > (unsigned) param_max_unrolled_average_calls_x1) > +nunroll = 0; > >if (targetm.loop_unroll_adjust) > nunroll = targetm.loop_unroll_adjust (nunroll, loop); > @@ -689,6 +693,9 @@ decide_unroll_runtime_iterations (class loop *loop, int > flags) > nunroll = nunroll_by_av; >if (nunroll > (unsigned) param_max_unroll_times) > nunroll = param_max_unroll_times; > + if ((average_num_loop_calls (loop) * (sreal) 1).to_int () > + > (unsigned) param_max_unrolled_average_calls_x1) > +nunroll = 0; > >if (targetm.loop_unroll_adjust) > nunroll = targetm.loop_unroll_adjust (nunroll, loop); > @@ -1173,6 +1180,9 @@ decide_unroll_stupid (class loop *loop, int flags) > nunroll = nunroll_by_av; >if (nunroll > (unsigned) param_max_unroll_times) > nunroll = param_max_unroll_times; > + if ((average_num_loop_calls (loop) * (sreal) 1).to_int () > + > (unsigned) param_max_unrolled_average_calls_x1) > +nunroll = 0; > >if (targetm.loop_unroll_adjust) > nunroll = targetm.loop_unroll_adjust (nunroll, loop); > diff --g
[PATCH] Check calls before loop unrolling
Hi, When unroll loops, if there are calls inside the loop, those calls may raise negative impacts for unrolling. This patch adds a param param_max_unrolled_calls, and checks if the number of calls inside the loop bigger than this param, loop is prevent from unrolling. This patch is checking the _average_ number of calls which is the summary of call numbers multiply the possibility of the call maybe executed. The _average_ number could be a fraction, to keep the precision, the param is the threshold number multiply 1. Bootstrap and regtest pass on powerpc64le. Is this ok for trunk? gcc/ChangeLog 2020-08-19 Jiufu Guo * params.opt (param_max_unrolled_average_calls_x1): New param. * cfgloop.h (average_num_loop_calls): New declare. * cfgloopanal.c (average_num_loop_calls): New function. * loop-unroll.c (decide_unroll_constant_iteration, decide_unroll_runtime_iterations, decide_unroll_stupid): Check average_num_loop_calls and param_max_unrolled_average_calls_x1. --- gcc/cfgloop.h | 2 ++ gcc/cfgloopanal.c | 25 + gcc/loop-unroll.c | 10 ++ gcc/params.opt| 4 4 files changed, 41 insertions(+) diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h index 18b404e292f..dab933da150 100644 --- a/gcc/cfgloop.h +++ b/gcc/cfgloop.h @@ -21,6 +21,7 @@ along with GCC; see the file COPYING3. If not see #define GCC_CFGLOOP_H #include "cfgloopmanip.h" +#include "sreal.h" /* Structure to hold decision about unrolling/peeling. */ enum lpt_dec @@ -387,6 +388,7 @@ extern vec get_loop_exit_edges (const class loop *, basic_block * = NULL); extern edge single_exit (const class loop *); extern edge single_likely_exit (class loop *loop, vec); extern unsigned num_loop_branches (const class loop *); +extern sreal average_num_loop_calls (const class loop *); extern edge loop_preheader_edge (const class loop *); extern edge loop_latch_edge (const class loop *); diff --git a/gcc/cfgloopanal.c b/gcc/cfgloopanal.c index 0b33e8272a7..a314db4e0c0 100644 --- a/gcc/cfgloopanal.c +++ b/gcc/cfgloopanal.c @@ -233,6 +233,31 @@ average_num_loop_insns (const class loop *loop) return ret; } +/* Count the number of call insns in LOOP. */ +sreal +average_num_loop_calls (const class loop *loop) +{ + basic_block *bbs; + rtx_insn *insn; + unsigned int i, bncalls; + sreal ncalls = 0; + + bbs = get_loop_body (loop); + for (i = 0; i < loop->num_nodes; i++) +{ + bncalls = 0; + FOR_BB_INSNS (bbs[i], insn) + if (CALL_P (insn)) + bncalls++; + + ncalls += (sreal) bncalls + * bbs[i]->count.to_sreal_scale (loop->header->count); +} + free (bbs); + + return ncalls; +} + /* Returns expected number of iterations of LOOP, according to measured or guessed profile. diff --git a/gcc/loop-unroll.c b/gcc/loop-unroll.c index 693c7768868..56b8fb37d2a 100644 --- a/gcc/loop-unroll.c +++ b/gcc/loop-unroll.c @@ -370,6 +370,10 @@ decide_unroll_constant_iterations (class loop *loop, int flags) nunroll = nunroll_by_av; if (nunroll > (unsigned) param_max_unroll_times) nunroll = param_max_unroll_times; + if (!loop->unroll + && (average_num_loop_calls (loop) * (sreal) 1).to_int () + > (unsigned) param_max_unrolled_average_calls_x1) +nunroll = 0; if (targetm.loop_unroll_adjust) nunroll = targetm.loop_unroll_adjust (nunroll, loop); @@ -689,6 +693,9 @@ decide_unroll_runtime_iterations (class loop *loop, int flags) nunroll = nunroll_by_av; if (nunroll > (unsigned) param_max_unroll_times) nunroll = param_max_unroll_times; + if ((average_num_loop_calls (loop) * (sreal) 1).to_int () + > (unsigned) param_max_unrolled_average_calls_x1) +nunroll = 0; if (targetm.loop_unroll_adjust) nunroll = targetm.loop_unroll_adjust (nunroll, loop); @@ -1173,6 +1180,9 @@ decide_unroll_stupid (class loop *loop, int flags) nunroll = nunroll_by_av; if (nunroll > (unsigned) param_max_unroll_times) nunroll = param_max_unroll_times; + if ((average_num_loop_calls (loop) * (sreal) 1).to_int () + > (unsigned) param_max_unrolled_average_calls_x1) +nunroll = 0; if (targetm.loop_unroll_adjust) nunroll = targetm.loop_unroll_adjust (nunroll, loop); diff --git a/gcc/params.opt b/gcc/params.opt index f39e5d1a012..80605861223 100644 --- a/gcc/params.opt +++ b/gcc/params.opt @@ -634,6 +634,10 @@ The maximum number of unrollings of a single loop. Common Joined UInteger Var(param_max_unrolled_insns) Init(200) Param Optimization The maximum number of instructions to consider to unroll in a loop. +-param=max-unrolled-average-calls-x1= +Common Joined UInteger Var(param_max_unrolled_average_calls_x1) Init(0) Param Optimization +The maximum number of calls to consider to unroll in a loop on average and multiply 1. + -param=max-unswitch-insns= Common Joined UInteger Var(param_max_unswitch_ins