[Bug tree-optimization/42108] [4.8/4.9/5 Regression] 50% performance regression
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108 --- Comment #65 from rguenther at suse dot de rguenther at suse dot de --- On Wed, 10 Dec 2014, burnus at gcc dot gnu.org wrote: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108 --- Comment #64 from Tobias Burnus burnus at gcc dot gnu.org --- (In reply to Richard Biener from comment #63) Unfortunately for the testcase this doesn't allow moving the division at all and we are lucky that we have range information at all because of the fortran frontend casting 'n' to unsigned before dividing by it. If it helps and the semantic is preserved, there is no reason not to completely change what tree code the Fortran FE generates for loops. The attached patch helps, but only sofar as enabling moving the division out one loop level - if you add another nest the entry condition on it will prevent moving the division further. [I think one reason for the odd way tree code for loops is generated is: The current code makes it simple to permit loops which are always run once, as some Fortran 66 compilers did. For instance, DO i = 2, 1 would then be executed once. (Such loops are not permitted in F66 - and some compilers executed them once others zero times; since F77, such loops are permitted and executed zero times. Unsurprisingly, some old code from the 60s relies on the execute once feature.) g77 and some commercial compilers have a compile flag like -f66, gfortran hasn't and I don't think it ever will.] I wondered if DO i = 1, 1, 0 is valid (a step of zero). Note that DO i = 2, 1 wouldn't be executed once with the current generated code, only DO i = 1, 1 is, but with step == 0 it would divide by zero.
[Bug tree-optimization/42108] [4.8/4.9/5 Regression] 50% performance regression
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108 --- Comment #66 from Salvatore Filippone sfilippone at uniroma2 dot it --- As far as I remember(In reply to rguent...@suse.de from comment #65) On Wed, 10 Dec 2014, burnus at gcc dot gnu.org wrote: Fortran 66 compilers did. For instance, DO i = 2, 1 would then be executed once. (Such loops are not permitted in F66 - and some compilers executed them once others zero times; since F77, such loops are permitted and executed zero times. Unsurprisingly, some old code from the 60s relies on the execute once feature.) g77 and some commercial compilers have a compile flag like -f66, gfortran hasn't and I don't think it ever will.] I wondered if DO i = 1, 1, 0 is valid (a step of zero). Note that DO i = 2, 1 wouldn't be executed once with the current generated code, only DO i = 1, 1 is, but with step == 0 it would divide by zero. Step 0 is not allowed (see e.g. F2003 Handbook, sect. 8.7.2.1.1). Salvatore
[Bug tree-optimization/42108] [4.8/4.9/5 Regression] 50% performance regression
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108 --- Comment #61 from Richard Biener rguenth at gcc dot gnu.org --- (In reply to Richard Biener from comment #60) Ok, so I have a patch that teaches LIM to move the division by using the value-range information we now store. Which can't be done this way ... (the value-range is only valid in the place it was computed, not where LIM may move it to)
[Bug tree-optimization/42108] [4.8/4.9/5 Regression] 50% performance regression
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108 --- Comment #62 from Richard Biener rguenth at gcc dot gnu.org --- Created attachment 34238 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34238action=edit frontend patch So the issue is that the division is executed conditionally because it is placed after the loop entry check. The frontend generates if (step 0) { if (to from) goto exit_label; countm1 = (to - from) / step; } else { if (to from) goto exit_label; countm1 = (from - to) / -step; } which ends up optimized to (as step is known to be positive here): if (to from) { countm1 = (to - from) / step; ...loop... if we instead generate if (step 0) { countm1 = (to - from) / step; if (to from) goto exit_label; } else { countm1 = (from - to) / -step; if (to from) goto exit_label; } the division can be hoisted out of the outer loop (at the cost of computing countm1 even when the loop is not entered -- but that would have happened if the desired transform happened anyway). Note that if step 0 cannot be optimized we still have the same issue, so in theory we could generate if (step 0) { tem1 = (to - from); tem2 = step; } else { tem1 = (from - to); tem2 = -step; } countm1 = tem1 / tem2; if (step 0) { if (to from) goto exit_label; } else { if (to from) goto exit_label; } but hoisting the division with conditional computed operands is a lot more expensive (but possible for exactly two ops, and likely done). The runtime overhead from the above is one extra branch. If we do that then unfortunately jump threading undoes that change creating the first variant again.
[Bug tree-optimization/42108] [4.8/4.9/5 Regression] 50% performance regression
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108 --- Comment #63 from Richard Biener rguenth at gcc dot gnu.org --- Created attachment 34239 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34239action=edit LIM patch This is a patch to loop invariant motion I was playing with to use range information to move a stmt up to a place where we can determine the range information is still correct. Unfortunately for the testcase this doesn't allow moving the division at all and we are lucky that we have range information at all because of the fortran frontend casting 'n' to unsigned before dividing by it. Another possibility is to (finally) teach LIM to hoist code which conditional execution status needs to be preserved by guarding the hoisted code with that conditional code. Thus hoist a / b as either b != 0 ? a / b : 0(?) or as execution condition ? a / b : 0. Where for this case the condition of execution is probably more complex than testing the denominator for zero (thus testing the condition under which we have to preserve execution). Followup optimizations then eventually can get rid of that condition.
[Bug tree-optimization/42108] [4.8/4.9/5 Regression] 50% performance regression
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108 --- Comment #64 from Tobias Burnus burnus at gcc dot gnu.org --- (In reply to Richard Biener from comment #63) Unfortunately for the testcase this doesn't allow moving the division at all and we are lucky that we have range information at all because of the fortran frontend casting 'n' to unsigned before dividing by it. If it helps and the semantic is preserved, there is no reason not to completely change what tree code the Fortran FE generates for loops. [I think one reason for the odd way tree code for loops is generated is: The current code makes it simple to permit loops which are always run once, as some Fortran 66 compilers did. For instance, DO i = 2, 1 would then be executed once. (Such loops are not permitted in F66 - and some compilers executed them once others zero times; since F77, such loops are permitted and executed zero times. Unsurprisingly, some old code from the 60s relies on the execute once feature.) g77 and some commercial compilers have a compile flag like -f66, gfortran hasn't and I don't think it ever will.]
[Bug tree-optimization/42108] [4.8/4.9/5 Regression] 50% performance regression
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org --- Comment #60 from Richard Biener rguenth at gcc dot gnu.org --- Ok, so I have a patch that teaches LIM to move the division by using the value-range information we now store.