https://gcc.gnu.org/bugzilla/show_bug.cgi?id=30409
--- Comment #7 from kargl at gcc dot gnu.org --- The attached testcase use xmin and xmax uninitialized. After setting xmin = 0 and xmax = 1 and adding z(1) to the print statements to prevent the inner loop from being optimized away, I see the following: % gfcx -o z -O0 a.f90 && ./z time 1: 1.78299993E-02 7249751.00 time 2: 6.37416887 7249751.00 % gfcx -o z -O1 a.f90 && ./z time 1: 1.37590002E-02 7249751.00 time 2: 6.36764479 7249751.00 % gfcx -o z -O2 a.f90 && ./z time 1: 1.23690004E-02 7249751.00 time 2: 1.85729897 7249751.00 % gfcx -o z -O3 a.f90 && ./z time 1: 2.43199989E-03 7249751.00 time 2: 1.85660207 7249751.00 % gfcx -o z -Ofast a.f90 && ./z time 1: 3.63499997E-03 7249751.50 time 2: 0.621210992 7249751.50 so the timing improves with optimization. -fdump-tree-original still shows the generation of a temporary variable for the actual argument 1/y in the second set of nested loops. -fdump-tree-optimized is fairly difficult for me to decipher, but it appears that the 1/y is not hoisted out of the inner loop.