>>Funny enough patched MacLucasFFTW works slower on QS20 (than PS3) and >>during the execution it uses only 8 SPUs form 16 available ! ? >>I tried using function fftw_cell_set_nspe(16) but still fftw runs on >>maximum 8 spes.> fftw suport 16SPE may be. >>Is it fftw limitation ? >>any ideas ? > > can see > > http://tu-dresden.de/die_tu_dresden/zentrale_einrichtungen/zih/forschung/architektur_und_leistungsanalyse_von_hochleistungsrechnern/cell//matmul/ > > numactl is your answer? > > ./cell/fftw-cell.h:#define MAX_NSPE 16 >
This limit is also in fftw-3.2.1/cell/cell.c static void set_default_nspe(void) { if (nspe < 0) { /* set NSPE to the maximum of 8 and the number of physical SPEs. A two-processor Cell blade reports 16 SPEs, but we only want to use one processor by default. */ #ifdef HAVE_LIBSPE2 int phys = spe_cpu_info_get(SPE_COUNT_PHYSICAL_SPES, -1); #else int phys = spe_count_physical_spes(); #endif if (phys > 8) phys = 8; X(cell_set_nspe)(phys); } } After I changed 8 to 16 it works on all available SPUs. running time ./MacLucasFFTW 32582657 I set MacLucasFFTW to terminate after j>=10000 , ( and also to printf every 1000 iters , not 100 , with big numbers to much output ) result was 10 minutes 24 seconds. I think this program is ready for some serious large prime hunting. _______________________________________________ Prime mailing list Prime@hogranch.com http://hogranch.com/mailman/listinfo/prime