>>Funny enough patched MacLucasFFTW works slower on QS20 (than PS3) and
>>during the execution it uses only 8 SPUs form 16 available ! ?
>>I tried using function fftw_cell_set_nspe(16) but still fftw runs on
>>maximum 8 spes.> fftw suport 16SPE may be.
>>Is it fftw limitation ?
>>any ideas ?
>
> can see
>
> http://tu-dresden.de/die_tu_dresden/zentrale_einrichtungen/zih/forschung/architektur_und_leistungsanalyse_von_hochleistungsrechnern/cell//matmul/
>
> numactl is your answer?
>
> ./cell/fftw-cell.h:#define MAX_NSPE 16
>

This limit is also in fftw-3.2.1/cell/cell.c

static void set_default_nspe(void)
{
     if (nspe < 0) {
          /* set NSPE to the maximum of 8 and the number of physical
             SPEs.  A two-processor Cell blade reports 16 SPEs, but we
             only want to use one processor by default. */
#ifdef HAVE_LIBSPE2
          int phys = spe_cpu_info_get(SPE_COUNT_PHYSICAL_SPES, -1);
#else
          int phys = spe_count_physical_spes();
#endif
          if (phys > 8)
               phys = 8;
          X(cell_set_nspe)(phys);
     }
}


After I changed 8 to 16 it works on all available SPUs.

running

time ./MacLucasFFTW 32582657

I set MacLucasFFTW to terminate after j>=10000 , ( and also to printf
every 1000 iters , not 100 , with big numbers to much output )

result was 10 minutes 24 seconds.

I think this program is ready for some serious large prime hunting.
_______________________________________________
Prime mailing list
Prime@hogranch.com
http://hogranch.com/mailman/listinfo/prime

Reply via email to