I've found a major performance regression in gcc 4.0.0's optimization of the BYTEmark numsort benchmark. I've boiled it down to a testcase that I think will suit you... it outputs a single number representing the number of iterations run (higher is better). On my machine I get 900ish under 4.0.0 and around 1530 on 3.4.3.
Both were compiled and run in a Gentoo test partition, if that makes a difference: 3.4.3: gcc version 3.4.3-20050110 (Gentoo Linux 3.4.3.20050110-r2, ssp-3.4.3.20050110-0, pie-8.7.7) 4.0.0: gcc version 4.0.0 (Gentoo Linux 4.0.0) -- Summary: BYTEmark numsort: performance regression 3.4.3 -> 4.0.0 with -O3 optimization Product: gcc Version: 4.0.0 Status: UNCONFIRMED Severity: normal Priority: P2 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: jbucata at tulsaconnect dot com CC: gcc-bugs at gcc dot gnu dot org GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21485