Wojciech Florek <[EMAIL PROTECTED]> writes:

>I've compiled Mlucas 2.7y on R4600 SGI IRIX 6.5 machine with 
>MIPSpro Compilers: Version 7.2.1.3m .
>Machine data:
>
>   Powered by Silicon Graphics
>                                      
>   CPU MIPS R4600 Processor Chip Revision: 2.0
>   FPU MIPS R4600 Floating Point Coprocessor Revision: 2.0
>   1 133 MHZ IP22 Processor
>   Main memory size 128 Mbytes
>   Secondary unified instruction/data cache size 512 Kbytes on Processor 0
>   Instruction cache size 16 Kbytes
>   Data cache size 16 Kbytes
>
>MacLucasUNIX writes checkpoints after each 5000 iterations, what makes
>about 2h 20 min (it isn't `clean' CPU time but almost 98% of CPU was 
>devoted to MacLucasUNIX), so I've run Mlucas 2.7 for the same exponent and
>5000 iterations (MacLucasUNIX was stopped so almost all CPU time was
>assigned to Mlucas). Here are the results
>
> Enter p,n (set n=0 for default FFT length) > 3355031,262144 
> Enter 'y' to run a self-test, <return> for a full LL test > y
> Enter number of iterations for timing test> 5000
>  p is prime...proceeding with Lucas-Lehmer test...
> M(  3355031 ): using an FFT length of  262144
>  this gives an average  12.798427581787109 bits per digit
> INFO: Using real*16 for FFT sincos and DWT weights tables inits.
>    5000 iterations of M 3355031 with FFT length  262144
> Res64: 6E592176A2A59208. Program: E2.7y
> Clocks = 01:49:34.963
>
>About 30 min faster! 

That's encouraging - now try the same exponents at runlength 192K (196608) -
you should get the same Res64, but your time should be even better.

>I've used the options provided in the source file and haven't played
>with them.

Make sure you use -r4600 rather than -r10000, and -mips{whatever generation
R4600 is, probably 2 or 3) rather than -mips4.

Your timing corresponds to
1.32 sec/iteration, compared to 0.185 sec at 256K for the same code on
a 250MHz MIPS R10000 - even after adjusting for the faster clock speed
of the latter and accounting for the fact that the R10000 is a more
advanced processor, I think you should be able to get somewhat better
performance out of your R4600. If you do find flags that give better
performance, let me know.

>Some remarks:
> 1. In the interactive mode entering <return> instead of `y' causes
>    a misprint in comments:
> Enter p,n (set n=0 for default FFT length) > 
> Enter 'y' to run a self-test, <return> for a full LL test > 
> p is prime...proceeding wit
>!! >>>>> a line is broken here! It doesn't happen when I replied `y'

That sounds like a compiler error, but there's an easy workaround - for
full LL tests you should be entering exponents into the worktodo.ini
file anyway, and that avoids having to enter any input.

>2. For small Mersenne exponents an `exit carry' error occurs. It's
>   happened for M787 & M797
>
> M(      797 ): using an FFT length of     512
>  this gives an average  1.556640625 bits per digit
>
> INFO: Using real*16 for FFT sincos and DWT weights tables inits.
> FATAL: iter= 225  nonzero exit carry in radix16_ditN_cy_dif1.

The carry propagation routines in Mlucas assume that any "wraparound" carry
left over when one gets to the most-significant digit of the residue vector
will propagate no further than 4 digits into the low end of the residue,
i.e. that (exit carry) <= 2^(4*radix). If you use a very small radix as
in the above the case, this assumption may no longer be true.

>I know that these exponents have been tested, but there is a question
>whether it may happen for larger mersennes?

There shouldn't be any problem with large exponents (> 5000, say) since
for large p, one always has an available FFT length which is well-
matched to the number in question, i.e. gives > 10 bits per digit. 

>3. My LL (double checking) tests on IRIX machines with MacLucasUNIX
>  are in progress (50% and 75%). I think that there is no possibility
>  to switch to Mlucas during tests and starting from the begining

Indeed there is not - if your MLU runs are > 25% complete, finish them
using MLU and use Mlucas for subsequent tests.

On the other hand, if you find a greater speed gain using 192K runlength
and/or better compile options, the above 25% breakover threshold will be
increased.

Happy hunting,
-Ernst

_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

Reply via email to