>>Funny enough patched MacLucasFFTW works slower on QS20 (than PS3) and
>>during the execution it uses only 8 SPUs form 16 available ! ?
>>I tried using function fftw_cell_set_nspe(16) but still fftw runs on
>>maximum 8 spes.> fftw suport 16SPE may be.
>>Is it fftw limitation ?
>>any ideas ?
>
> can see
>
> http://tu-dresden.de/die_tu_dresden/zentrale_einrichtungen/zih/forschung/architektur_und_leistungsanalyse_von_hochleistungsrechnern/cell//matmul/
>
> numactl is your answer?
>
> ./cell/fftw-cell.h:#define MAX_NSPE 16
>
This limit is also in fftw-3.2.1/cell/cell.c
static void set_default_nspe(void)
{
if (nspe < 0) {
/* set NSPE to the maximum of 8 and the number of physical
SPEs. A two-processor Cell blade reports 16 SPEs, but we
only want to use one processor by default. */
#ifdef HAVE_LIBSPE2
int phys = spe_cpu_info_get(SPE_COUNT_PHYSICAL_SPES, -1);
#else
int phys = spe_count_physical_spes();
#endif
if (phys > 8)
phys = 8;
X(cell_set_nspe)(phys);
}
}
After I changed 8 to 16 it works on all available SPUs.
running
time ./MacLucasFFTW 32582657
I set MacLucasFFTW to terminate after j>=10000 , ( and also to printf
every 1000 iters , not 100 , with big numbers to much output )
result was 10 minutes 24 seconds.
I think this program is ready for some serious large prime hunting.
_______________________________________________
Prime mailing list
[email protected]
http://hogranch.com/mailman/listinfo/prime