Nicolas Neuss <[EMAIL PROTECTED]> writes: > C (dynamic) C (static) CMUCL > P2 (400 MHz) 4.3 sec 1.1 sec 1.5 sec > P4 (2.4 GHz) 0.59 sec 0.30 sec 0.28 sec > > I would suspect that you might observe similar results.
Unfortunately, I have to correct myself. I have redone those measurements now, and the numbers have changed (at least for the Pentium 4). Especially, with an improved ordering of the loops in the dynamic C-code, the speed difference between the C versions is much smaller. The code is here (measures 100 stencil applications): http://cox.iwr.uni-heidelberg.de/~neuss/misc/stencil-dynamic.c gcc -O3 stencil-dynamic.c; time a.out 1.000000 real 0m3.412s user 0m3.380s sys 0m0.010s http://cox.iwr.uni-heidelberg.de/~neuss/misc/stencil-static.c gcc -O3 stencil-static.c; time a.out 1.000000 real 0m3.011s user 0m2.970s sys 0m0.010s http://cox.iwr.uni-heidelberg.de/~neuss/misc/stencil.lisp (time (test *nine-point-stencil*)) ; Evaluation took: ; 2.84 seconds of real time ; 2.83 seconds of user run time ; 0.01 seconds of system run time ; 6,836,451,692 CPU cycles ; 0 page faults and ; 9,248,856 bytes consed. ; 1.0d0 So I do not expect anymore that Ryan will see substantial gains unless runtime analysis of the polynomial allows to simplify things a lot. Yours, Nicolas.
