GCC trunk rev. 154141 seems to handle less efficiently a convolution code than previous stable releases, it was also spotted in revision 153048.
Here are some average timings on an Intel E5320 clocked at 1.86 GHz with 4 MB of L2 cache, Debian GNU/Linux with a 2.6.26 kernel. * with -O2 -march=native GCC 4.3.2 8239 ms GCC-4.4.2 8102 ms GCC-snapshot-20091105 9347 ms GCC-trunk-r154141 9343 ms * with -O2 GCC 4.3.2 8128 ms GCC-4.4.2 8158 ms GCC-snapshot-20091105 9824 ms GCC-trunk-r154141 9828 ms * with -O1 GCC 4.3.2 20926 ms GCC-4.4.2 8277 ms GCC-snapshot-20091105 9369 ms GCC-trunk-r154141 9375 ms * with -O0 GCC 4.3.2 34061 ms GCC-4.4.2 34241 ms GCC-snapshot-20091105 34903 ms GCC-trunk-r154141 34910 ms GCC compiled with : configure --prefix=/export/home/nicolas/gcc/trunk-install --enable-languages=c --disable-multilib --disable-bootstrap --enable-checking=release I haven't been able to track down the origin of the performance difference. Note that data are not initialized in the attached code, as the slowdown is observed wether they are or not. ---BEGIN code--- #define N 1024*512 #define M 512 #define ITER 16 double in[N]; double H[M]; double vH[N]; int main ( int argc, char **argv ) { int i, j, k; for ( i=0; i<ITER; ++i ) for ( j=0; j<N; ++j ) for ( k=0; (k<M)&&(k<=j); ++k ) vH[j] += H[k]*in[j-k]; return (int) vH[argc]; } ---END code--- -- Summary: Performance regression in convolution loop optimization Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: nbenoit at tuxfamily dot org GCC build triplet: i686-pc-linux-gnu GCC host triplet: i686-pc-linux-gnu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42027