Hi, [email protected] wrote: >Funny enough patched MacLucasFFTW works slower on QS20 (than PS3) and >during the execution it uses only 8 SPUs form 16 available ! ? >I tried using function fftw_cell_set_nspe(16) but still fftw runs on >maximum 8 spes.
http://www.fftw.org/cell/cellblade/ >IBM QS20 Cell Blade >IBM Cell Blade: 2 Cell Processors, 3.2GHz Cell Broadband Engine, IBM Cell SDK >2.0, Linux 2.6.18, 8 SPEs available. >The benchmark uses one processor only. fftw 8 SPUs max. I guss fftw_cell_set_nspe(6) or fftw_cell_set_nspe(7) is best on (now) your system. w = ALLOC_DOUBLES(n/2); fftw_cell_set_nspe(4); forw = fftw_plan_dft_1d(n/2,x,x,FFTW_BACKWARD,FFTW_ESTIMATE); call fftw_cell_set_nspe() before first call fftw_plan_...(). http://netnews.gotdns.org/WallStreet/6351/gfn/lucas.ps3/MacLucasFFTW.ps3.6.patch ps3$ time ./MacLucasFFTW 216091 M( 216091 )P, n = 16384, MacLucasFFTW v8.1 Ballester real 4m17.501s user 4m13.994s sys 0m3.453s ps3$ time ./MacLucasFFTW 859433 M( 859433 )P, n = 65536, MacLucasFFTW v8.1 Ballester real 65m59.341s user 64m26.565s sys 1m25.307s Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 38.87 1442.65 1442.65 859430 1.68 1.68 normalize 32.90 2663.58 1220.93 859431 1.42 1.42 rftfsub 28.03 3703.96 1040.38 fftw_cell_spe_wait_all ps3$ time ./MacLucasFFTW 32582657 1 2097152 10001 2097152 ^C real 29m15.234s user 28m38.115s sys 0m35.813s Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 43.78 730.24 730.24 10010 72.95 72.95 normalize 36.60 1340.67 610.43 fftw_cell_spe_wait_all <- fftw cost 19.57 1667.10 326.43 10010 32.61 32.61 rftfsub .. . . . . . . . . . Shoichiro Yamada [email protected] _______________________________________________ Prime mailing list [email protected] http://hogranch.com/mailman/listinfo/prime
