------- Comment #39 from lucier at math dot purdue dot edu 2008-12-06 16:37 ------- I may have narrowed down the problem a bit.
With this compiler (revision 118491): pythagoras-277% /tmp/lucier/install/bin/gcc -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: ../../mainline/configure --enable-checking=release --prefix=/tmp/lucier/install --enable-languages=c Thread model: posix gcc version 4.3.0 20061105 (experimental) one gets (on a faster machine than previous reports) (time (direct-fft-recursive-4 a table)) 133 ms real time 140 ms cpu time (140 user, 0 system) no collections 64 bytes allocated no minor faults no major faults With this compiler (revision 118474): pythagoras-24% /tmp/lucier/install/bin/gcc -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: ../../mainline/configure --enable-checking=release --prefix=/tmp/lucier/install --enable-languages=c Thread model: posix gcc version 4.3.0 20061104 (experimental) one gets (time (direct-fft-recursive-4 a table)) 116 ms real time 108 ms cpu time (108 user, 0 system) no collections 64 bytes allocated no minor faults no major faults and you see the typical problem with assembly code from direct.i with the later compiler. Paolo may have been right about fwprop, this patch was installed that day: Author: bonzini Date: Sat Nov 4 08:36:45 2006 New Revision: 118475 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=118475 Log: 2006-11-03 Paolo Bonzini <[EMAIL PROTECTED]> Steven Bosscher <[EMAIL PROTECTED]> * fwprop.c: New file. * Makefile.in: Add fwprop.o. * tree-pass.h (pass_rtl_fwprop, pass_rtl_fwprop_with_addr): New. * passes.c (init_optimization_passes): Schedule forward propagation. * rtlanal.c (loc_mentioned_in_p): Support NULL value of the second parameter. * timevar.def (TV_FWPROP): New. * common.opt (-fforward-propagate): New. * opts.c (decode_options): Enable forward propagation at -O2. * gcse.c (one_cprop_pass): Do not run local cprop unless touching jumps. * cse.c (fold_rtx_subreg, fold_rtx_mem, fold_rtx_mem_1, find_best_addr, canon_for_address, table_size): Remove. (new_basic_block, insert, remove_from_table): Remove references to table_size. (fold_rtx): Process SUBREGs and MEMs with equiv_constant, make simplification loop more straightforward by not calling fold_rtx recursively. (equiv_constant): Move here a small part of fold_rtx_subreg, do not call fold_rtx. Call avoid_constant_pool_reference to process MEMs. * recog.c (canonicalize_change_group): New. * recog.h (canonicalize_change_group): New. * doc/invoke.texi (Optimization Options): Document fwprop. * doc/passes.texi (RTL passes): Document fwprop. Added: trunk/gcc/fwprop.c Modified: trunk/gcc/ChangeLog trunk/gcc/Makefile.in trunk/gcc/common.opt trunk/gcc/cse.c trunk/gcc/doc/invoke.texi trunk/gcc/doc/passes.texi trunk/gcc/gcse.c trunk/gcc/opts.c trunk/gcc/passes.c trunk/gcc/recog.c trunk/gcc/recog.h trunk/gcc/rtlanal.c trunk/gcc/timevar.def trunk/gcc/tree-pass.h -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928