http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47167

           Summary: Performance regression in numerical code
           Product: gcc
           Version: 4.6.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: other
        AssignedTo: unassig...@gcc.gnu.org
        ReportedBy: mar...@mpa-garching.mpg.de


Created attachment 22897
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=22897
test case

When compiling the attached testcase on a machine with a Core 2 Duo E8500 CPU
and 64bit Linux using

gcc -O2 -fomit-frame-pointer testcase.i -lm

the results with gcc 4.5.1 are

Testing map analysis accuracy.
lmax=2047, 0 iterations, spin=0

Testing ECP grid (4096 rings, 4096 pixels/ring, 16777216 pixels)

iteration 0:
wall time for alm2map: 8.811477s
wall time for map2alm: 9.204556s
component 0: rms 1.390734e-13, maxerr 1.582512e-12

However, with current trunk one obtains

Testing map analysis accuracy.
lmax=2047, 0 iterations, spin=0

Testing ECP grid (4096 rings, 4096 pixels/ring, 16777216 pixels)

iteration 0:
wall time for alm2map: 9.518667s
wall time for map2alm: 9.780509s
component 0: rms 1.390734e-13, maxerr 1.582512e-12

The numerical result is identical, but the code generated by the more recent
compiler is noticeably slower.

Reducing the test case is unfortunately not trivial; the computational hot
spots are located in pshtd_inner_loop() and Ylmgen_recalc_Ylm_sse2().

Please let me know if I can provide further information.

Reply via email to