Hello, this is in a sense continuation of
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29818 , the discussion on performance. Here I'll present performance numbers obtained with widely available GPL'ed code - fftw-3.1.2. I did the following: 1) built gcc-3.4.6; 2) ran 10 times this command line: /usr/bin/time /maxtor5/sergei/AppsFromScratchWD/build/fftw-3.1.2/tests/bench --speed if524288 -v4 -oexhaustive - 'fftw-3.1.2/tests/bench' comes with fftw-3.1.2. 3) built gcc-4.1.1; 4) repeated '2)'. Here are the results. gcc-3.4.6: Problem: if524288, setup: 30.90 s, time: 88.12 ms, ``mflops'': 565.2 31.26user 0.21system 0:31.76elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+5107minor)pagefaults 0swaps Problem: if524288, setup: 30.90 s, time: 88.33 ms, ``mflops'': 563.86 31.32user 0.21system 0:31.75elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+5136minor)pagefaults 0swaps Problem: if524288, setup: 30.89 s, time: 88.51 ms, ``mflops'': 562.76 31.20user 0.24system 0:31.69elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+5134minor)pagefaults 0swaps Problem: if524288, setup: 30.93 s, time: 88.49 ms, ``mflops'': 562.86 31.41user 0.20system 0:31.84elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+5130minor)pagefaults 0swaps Problem: if524288, setup: 30.90 s, time: 88.55 ms, ``mflops'': 562.45 31.35user 0.22system 0:31.82elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+5133minor)pagefaults 0swaps Problem: if524288, setup: 31.25 s, time: 90.50 ms, ``mflops'': 550.37 82.48user 0.46system 1:23.56elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+13044minor)pagefaults 0swaps Problem: if524288, setup: 30.89 s, time: 88.11 ms, ``mflops'': 565.29 31.24user 0.21system 0:31.70elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+5130minor)pagefaults 0swaps Problem: if524288, setup: 30.89 s, time: 88.29 ms, ``mflops'': 564.15 31.25user 0.24system 0:31.75elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+5134minor)pagefaults 0swaps Problem: if524288, setup: 30.85 s, time: 87.81 ms, ``mflops'': 567.2 31.26user 0.21system 0:31.70elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+5130minor)pagefaults 0swaps Problem: if524288, setup: 30.89 s, time: 88.71 ms, ``mflops'': 561.45 87.62user 0.44system 1:28.72elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+13294minor)pagefaults 0swaps ; gcc-4.1.1: Problem: if524288, setup: 32.13 s, time: 91.64 ms, ``mflops'': 543.53 32.51user 0.23system 0:33.01elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+5114minor)pagefaults 0swaps Problem: if524288, setup: 32.11 s, time: 92.67 ms, ``mflops'': 537.45 84.25user 0.45system 1:25.31elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+13295minor)pagefaults 0swaps Problem: if524288, setup: 32.16 s, time: 92.33 ms, ``mflops'': 539.44 84.84user 0.46system 1:25.94elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+13301minor)pagefaults 0swaps Problem: if524288, setup: 32.18 s, time: 92.54 ms, ``mflops'': 538.22 85.41user 0.49system 1:27.18elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+13299minor)pagefaults 0swaps Problem: if524288, setup: 32.19 s, time: 91.40 ms, ``mflops'': 544.91 32.54user 0.22system 0:33.03elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+5139minor)pagefaults 0swaps Problem: if524288, setup: 32.17 s, time: 92.60 ms, ``mflops'': 537.9 91.29user 0.45system 1:32.42elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+13301minor)pagefaults 0swaps Problem: if524288, setup: 32.20 s, time: 91.83 ms, ``mflops'': 542.37 32.60user 0.24system 0:33.08elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+5140minor)pagefaults 0swaps Problem: if524288, setup: 32.15 s, time: 91.82 ms, ``mflops'': 542.42 32.60user 0.22system 0:33.04elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+5138minor)pagefaults 0swaps Problem: if524288, setup: 32.16 s, time: 91.37 ms, ``mflops'': 545.12 32.54user 0.23system 0:32.99elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+5140minor)pagefaults 0swaps Problem: if524288, setup: 32.11 s, time: 91.24 ms, ``mflops'': 545.89 32.48user 0.21system 0:32.92elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+5141minor)pagefaults 0swaps . IMO difference in favor of gcc-3.4.6 is seen with naked eye (see, for example, ``mflops'' - larger numbers are better). Say, let's compare worst numbers: gcc-3.4.6 : 550.37 gcc-4.1.1 : 537.45 . I think it's worth porting gcc-3.4.6 x86 optimization engine to gcc-4.* series. -- Summary: gcc-4.1.1 generates consistently worse performming SSE code than gcc-3.4.6 Product: gcc Version: 4.1.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: sergstesh at yahoo dot com GCC build triplet: Linux comp.home.net 2.6.12-27mdk-i686-up-4GB #1 Tue Sep 26 12:41 GCC host triplet: Linux comp.home.net 2.6.12-27mdk-i686-up-4GB #1 Tue Sep 26 12:41 GCC target triplet: Linux comp.home.net 2.6.12-27mdk-i686-up-4GB #1 Tue Sep 26 12:41 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29874