https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110713
Bug ID: 110713 Summary: Fatigue2 runs twice as fast with increased inlining limits Product: gcc Version: 13.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- jh@ryzen3:~/pb11/lin/source> ~/trunk-histogram/bin/gfortran fatigue2.f90 -Ofast -march=native -fdump-tree-all-details-blocks -fdump-rtl-all-details -fdump-ipa-all-details --param max-inline-insns-auto=110 ; perf stat ./a.out >/dev/null Performance counter stats for './a.out': 13937.07 msec task-clock:u # 1.000 CPUs utilized 0 context-switches:u # 0.000 /sec 0 cpu-migrations:u # 0.000 /sec 138 page-faults:u # 9.902 /sec 67489472294 cycles:u # 4.842 GHz (83.33%) 38791427 stalled-cycles-frontend:u # 0.06% frontend cycles idle (83.33%) 2351353 stalled-cycles-backend:u # 0.00% backend cycles idle (83.33%) 147268347462 instructions:u # 2.18 insn per cycle # 0.00 stalled cycles per insn (83.33%) 5705431257 branches:u # 409.371 M/sec (83.35%) 13638274 branch-misses:u # 0.24% of all branches (83.35%) 13.941876147 seconds time elapsed 13.933226000 seconds user 0.003999000 seconds sys jh@ryzen3:~/pb11/lin/source> ~/trunk-histogram/bin/gfortran fatigue2.f90 -Ofast -march=native -fdump-tree-all-details-blocks -fdump-rtl-all-details -fdump-ipa-all-details ; perf stat ./a.out >/dev/null Performance counter stats for './a.out': 31300.68 msec task-clock:u # 1.000 CPUs utilized 0 context-switches:u # 0.000 /sec 0 cpu-migrations:u # 0.000 /sec 138 page-faults:u # 4.409 /sec 150619261261 cycles:u # 4.812 GHz (83.32%) 779861463 stalled-cycles-frontend:u # 0.52% frontend cycles idle (83.33%) 4695025 stalled-cycles-backend:u # 0.00% backend cycles idle (83.34%) 242822794319 instructions:u # 1.61 insn per cycle # 0.00 stalled cycles per insn (83.34%) 13542051898 branches:u # 432.644 M/sec (83.34%) 14587945 branch-misses:u # 0.11% of all branches (83.34%) 31.301169341 seconds time elapsed 31.296826000 seconds user 0.003999000 seconds sys The main differnece is inlning generalized_hookes_law. While it looks quite big at release_ssa time, after vectorization it gets loopless and inlining is a big win. function generalized_hookes_law (strain_tensor, lambda, mu) result (stress_tensor) ! ! Author: Dr. John K. Prentice ! Affiliation: Quetzal Computational Associates, Inc. ! Dates: 28 November 1997 ! ! Purpose: Apply the generalized Hooke's law for elasticity to the strain tensor ! (or strain rate tensor) to compute the stress tensor (or stress rate ! tensor) ! !############################################################################################ ! ! Input: ! ! strain_tensor [selected_real_kind(15,90), dimension(3,3)] ! stress tensor ! ! lambda [selected_real_kind(15,90)] ! Lame constant Lambda ! ! mu [selected_real_kind(15,90)] ! Lame constant mu ! ! Output: ! ! stress_tensor [selected_real_kind(15,90), dimension(3,3)] ! stress tensor ! !############################################################################################ ! ! !=========== formal variables ============= ! real (kind = LONGreal), dimension(:,:), intent(in) :: strain_tensor real (kind = LONGreal), intent(in) :: lambda, mu real (kind = LONGreal), dimension(3,3) :: stress_tensor ! !========== internal variables ============ ! real (kind = LONGreal), dimension(6) ::generalized_strain_vector, & generalized_stress_vector real (kind = LONGreal), dimension(6,6) :: generalized_constitutive_tensor integer :: i ! ! construct the generalized constitutive tensor for elasticity ! generalized_constitutive_tensor(:,:) = 0.0_LONGreal generalized_constitutive_tensor(1,1) = lambda + 2.0_LONGreal * mu generalized_constitutive_tensor(1,2) = lambda generalized_constitutive_tensor(1,3) = lambda generalized_constitutive_tensor(2,1) = lambda generalized_constitutive_tensor(2,2) = lambda + 2.0_LONGreal * mu generalized_constitutive_tensor(2,3) = lambda generalized_constitutive_tensor(3,1) = lambda generalized_constitutive_tensor(3,2) = lambda generalized_constitutive_tensor(3,3) = lambda + 2.0_LONGreal * mu generalized_constitutive_tensor(4,4) = mu generalized_constitutive_tensor(5,5) = mu generalized_constitutive_tensor(6,6) = mu ! ! construct the generalized strain vector (using double index notation) ! generalized_strain_vector(1) = strain_tensor(1,1) generalized_strain_vector(2) = strain_tensor(2,2) generalized_strain_vector(3) = strain_tensor(3,3) generalized_strain_vector(4) = strain_tensor(2,3) generalized_strain_vector(5) = strain_tensor(1,3) generalized_strain_vector(6) = strain_tensor(1,2) ! ! compute the generalized stress vector ! do i = 1, 6 generalized_stress_vector(i) = dot_product(generalized_constitutive_tensor(i,:), & generalized_strain_vector(:)) end do ! ! update the stress tensor ! stress_tensor(1,1) = generalized_stress_vector(1) stress_tensor(2,2) = generalized_stress_vector(2) stress_tensor(3,3) = generalized_stress_vector(3) stress_tensor(2,3) = generalized_stress_vector(4) stress_tensor(1,3) = generalized_stress_vector(5) stress_tensor(1,2) = generalized_stress_vector(6) stress_tensor(3,2) = stress_tensor(2,3) stress_tensor(3,1) = stress_tensor(1,3) stress_tensor(2,1) = stress_tensor(1,2) ! end function generalized_hookes_law