The following code (borrowed from http://gcc.gnu.org/ml/gcc/2008-05/msg00134.html):
integer(8), parameter :: l = z'5fe6eb3be0000000' integer, parameter :: ni = 3 integer :: i, j, n integer(8) :: k real(8) :: a, b, e, m, s equivalence (b, k) a = 1.0d0 e = epsilon(1.0)/2.0d0**4 m = 0.0d0 s = 0.0d0 n = 0 do n = n + 1 b = a k = l - ishft(k, -1_8) do i = 1, ni b = b*(1.5-(0.5*a)*b*b) end do b = b + b*(0.5-(0.5*a)*b*b) ! b = 1.0d0/sqrt(a) m = max(m, abs(a*b*b - 1.0d0)) s = s + abs(a*b*b - 1.0d0) a = a + e if (a == 2.0d0) exit end do print *, n, m/epsilon(a), s/(n*epsilon(a)) end gives the following timings: [ibook-dhum] bug/timing% gfc -m64 -O3 rsqrt_8_nr_v1_s.f90 [ibook-dhum] bug/timing% time a.out 134217728 2.0000000000000000 0.36966567113995552 2.662u 0.008s 0:02.67 99.6% 0+0k 0+1io 0pf+0w [ibook-dhum] bug/timing% gfc -m32 -O3 rsqrt_8_nr_v1_s.f90 [ibook-dhum] bug/timing% time a.out 134217728 2.0000000000000000 0.36966567113995552 7.401u 0.023s 0:07.42 100.0% 0+0k 0+0io 0pf+0w For comparison the following code: integer :: n real(8) :: a, b, e, m, s a = 1.0d0 e = epsilon(1.0)/2.0d0**4 s = 0.0d0 m = 0.0d0 n = 0 do n = n + 1 b = 1.0d0/sqrt(a) s = s + abs(a*b*b - 1.0d0) m = max(m, abs(a*b*b - 1.0d0)) a = a + e if (a == 2.0d0) exit end do print *, n, m/epsilon(a), s/(n*epsilon(a)) end gives [ibook-dhum] bug/timing% gfc -m64 -O3 rsqrt_8_s.f90 [ibook-dhum] bug/timing% time a.out 134217728 1.00000000000000000 0.49419290572404861 5.469u 0.002s 0:05.47 99.8% 0+0k 0+0io 0pf+0w [ibook-dhum] bug/timing% gfc -m32 -O3 rsqrt_8_s.f90 [ibook-dhum] bug/timing% time a.out 134217728 1.00000000000000000 0.49419290572404861 5.475u 0.020s 0:05.49 100.0% 0+0k 0+0io 0pf+0w Note that the later code is vectorized, while the former one is not. -- Summary: Executable compiled with -m64 almost three times faster than the one compiled with -m32 on Core2Duo Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dominiq at lps dot ens dot fr GCC build triplet: i686-apple-darwin9 GCC host triplet: i686-apple-darwin9 GCC target triplet: i686-apple-darwin9 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36241