Nicolas Neuss <[EMAIL PROTECTED]> writes:

>               C (dynamic)  C (static)  CMUCL
> P2 (400 MHz)  4.3 sec      1.1 sec     1.5  sec
> P4 (2.4 GHz)  0.59 sec     0.30 sec    0.28 sec
> 
> I would suspect that you might observe similar results.

Unfortunately, I have to correct myself.  I have redone those measurements
now, and the numbers have changed (at least for the Pentium 4).
Especially, with an improved ordering of the loops in the dynamic C-code,
the speed difference between the C versions is much smaller.

The code is here (measures 100 stencil applications):

http://cox.iwr.uni-heidelberg.de/~neuss/misc/stencil-dynamic.c

gcc -O3 stencil-dynamic.c; time a.out
1.000000
real    0m3.412s
user    0m3.380s
sys     0m0.010s

http://cox.iwr.uni-heidelberg.de/~neuss/misc/stencil-static.c

gcc -O3 stencil-static.c; time a.out
1.000000
real    0m3.011s
user    0m2.970s
sys     0m0.010s

http://cox.iwr.uni-heidelberg.de/~neuss/misc/stencil.lisp

(time (test *nine-point-stencil*))

; Evaluation took:
;   2.84 seconds of real time
;   2.83 seconds of user run time
;   0.01 seconds of system run time
;   6,836,451,692 CPU cycles
;   0 page faults and
;   9,248,856 bytes consed.
; 
1.0d0

So I do not expect anymore that Ryan will see substantial gains unless
runtime analysis of the polynomial allows to simplify things a lot.

Yours, Nicolas.


Reply via email to