Re: Mersenne: Re: Roundoff errors

2002-07-24 Thread Daran

- Original Message -
From: "George Woltman" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Wednesday, July 24, 2002 3:01 AM
Subject: Mersenne: Re: Roundoff errors

> Daran mentioned that he was only doing P-1 factoring,
> leaving the LL test for others.

That was correct.  I have just started a LL test on your advice with ro
checking on.  If it completes, I will return to doing P-1 only.

I'm satisfied that there is no evidence of any problem not attributable to
overheating.

Regards

A very cool running Daran G.


_
Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Mersenne: Re: Roundoff errors

2002-07-23 Thread George Woltman

At 06:49 PM 7/23/2002 -0400, [EMAIL PROTECTED] wrote:
> >The M12898349 error may be OK as it may have been very
> >close to the limit of the 640K FFT (I don't recall the
> >22.3 crossover points).
>
>According the the GIMPS status page, the 640K crossover
>point for the more-accurate x86 client (i.e. the non-
>SSE2 code) is 12.83M,

Thanks to Brian's research we've revisited all the x87 crossovers.
Some v22 executables had a 12.90M crossover.  This has been
lowered to 12.89M in v22.7.

>Except that P-1 stage 1 (unless I'm mistaken about
>Prime95's particular implementation thereof) uses the
>same FFT-based squaring code that the LL test does.
>it seems curious that Daran mentions seeing the roundoff
>warnings in p-1 stage 1, and not in the subsequent LL
>tests of these exponents. George, is error checking done
>more frequently in p-1 than in LL?

Daran mentioned that he was only doing P-1 factoring,
leaving the LL test for others.  The P-1 factoring code
does error checking just as often as the LL code.


_
Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Mersenne: Re: Roundoff errors

2002-07-23 Thread EWMAYER
Daran wrote:

>Twice, in the past three days I've had the message:
>"Possible roundoff error (0.4xxx), backtracking to
>last save file.", while performing P-1 stage one on
>M13194707 and M12898349 respectively.

George Woltman replied:

>The M12898349 error may be OK as it may have been very
>close to the limit of the 640K FFT (I don't recall the
>22.3 crossover points).

According the the GIMPS status page, the 640K crossover
point for the more-accurate x86 client (i.e. the non-
SSE2 code) is 12.83M, which is already less than the
smaller of Daran's exponents - the SSE2 crossovers
(irrespective of the precise code version) are going
to be smaller still. Thus, neither of these exponents
should be yielding close-to-crossover roundoff errors,
since both should be running in the lower part of the
768K-FFT-length exponent range.

>
>Errors during P-1 are not particularly critical.

Except that P-1 stage 1 (unless I'm mistaken about
Prime95's particular implementation thereof) uses the
same FFT-based squaring code that the LL test does. If
there's a hardware problem or (say) a device driver that
is corrupting data, it would seem to be just as likely
to do so in either the p-1 step or the LL test. Still,
it seems curious that Daran mentions seeing the roundoff
warnings in p-1 stage 1, and not in the subsequent LL
tests of these exponents. George, is error checking done
more frequently in p-1 than in LL? Daran, were you doing
p-1 ONLY on these exponents, or were these the first
Prime95 runs you did on some new machines? (In which case
the code perhaps hadn't gotten to the LL stage yet.)

-Ernst


Mersenne: Re: Roundoff errors

2000-05-10 Thread EWMAYER

I wrote:

 > Especially for large runlengths (and after the first few hundred iterations
 > or so), rounding errors tend to be randomly distributed in an approximately
 > Gaussian fashion,

Brian Beesley wrote:

> I know perfectly well what you mean, but these two statements tend to 
> contradict each other. Gaussian distributions are continuous & 
> smooth, we have instead a discrete distribution whose gaps tend to 
> increase with size.

I added some code to the Mlucas 2.7a source (my personal copy, not the version
on my website) to better-quantify this. Since fractional errors are defined
in the interval [0, 0.5], I subdivided this interval into 64 equal-sized bins
and counted how many fractional parts fell into each bin on each iteration.

Then I ran around a hundred iterations of M1325011 using an FFT length of 64K,
i.e. an exponent close to the practical roundoff limit for that FFT length on
an IEEE64-compliant machine. Up to iteration 13, all fractional errors fell
into bin 0, i.e. were in [0, 1/128). After that, things looked as follows -
In the table below, the vertical placement corresponds to bins 0 through 64
(bin 64 corresponds to the special case of an error = 0.5), and the table
entries denote the number of fractional errors in the corresponding bin:

  iteration
bin   14  15  16  17  18  19  20  21  ...  44
---   -   -   -   -   -   -   -   -   ...  -
0 65530   65457   64930   61600   44951   23245   96807781 7555
1 6   76  555 333215541   19547   10589   7371 7158
2 0   3   44  475 357311771   12090   1043310356
3 0   0   7   96  972 562076395971 5961
4 0   0   0   30  348 308985189169 9353
5 0   0   0   3   84  106543214033 4042
6 0   0   0   0   55  740 47515804 5841
7 0   0   0   0   5   164 19232462 2489
8 0   0   0   0   6   199 27084603 4730
9 0   0   0   0   0   32  743 1262 1290
100   0   0   0   1   48  11492203 2185
110   0   0   0   0   2   266 594  616
120   0   0   0   0   10  672 1803 1890
130   0   0   0   0   0   62  277  259
140   0   0   0   0   3   206 591  606
150   0   0   0   0   0   21  110  118
160   0   0   0   0   1   137 614  636
170   0   0   0   0   0   3   43   49
180   0   0   0   0   0   23  146  121
190   0   0   0   0   0   0   19   22
200   0   0   0   0   0   29  164  162
210   0   0   0   0   0   0   56
220   0   0   0   0   0   2   14   20
230   0   0   0   0   0   0   12
240   0   0   0   0   0   4   49   51
250   0   0   0   0   0   0   20
260   0   0   0   0   0   0   43
270   0   0   0   0   0   0   10
280   0   0   0   0   0   0   611
290   0   0   0   0   0   0   00
300   0   0   0   0   0   0   00
310   0   0   0   0   0   0   00
320   0   0   0   0   0   0   14
330   0   0   0   0   0   0   00
340   0   0   0   0   0   0   00
350   0   0   0   0   0   0   00
360   0   0   0   0   0   0   00
370   0   0   0   0   0   0   00
380   0   0   0   0   0   0   00
390   0   0   0   0   0   0   00
400   0   0   0   0   0   0   00
410   0   0   0   0   0   0   00
420   0   0   0   0   0   0   00
430   0   0   0   0   0   0   00
440   0   0   0   0   0   0   0 

Re: Mersenne: Re: Roundoff errors

2000-05-09 Thread Brian J. Beesley

On 7 May 00, at 17:41, [EMAIL PROTECTED] wrote:

> A large significant (non-error)
> part of the convolution coefficient means that any accumulated rounding
> errors will collect in the least-significant few bits of the floating-point
> mantissa. That's why errors close to 0.5 tend to come in the form of
> (integer)/(small power of 2).
> 
> b) Especially for large runlengths (and after the first few hundred iterations
> or so), rounding errors tend to be randomly distributed in an approximately
> Gaussian fashion,

I know perfectly well what you mean, but these two statements tend to 
contradict each other. Gaussian distributions are continuous & 
smooth, we have instead a discrete distribution whose gaps tend to 
increase with size.

If we have a mechanism which tends to "chop down" the result of a 
floating-point operation towards zero when the FPU register isn't 
accurate enough to contain it all, and we're down to 4 guard bits, 
then an actual rounding error of 0.49 could be represented as 
0.4375, i.e. there is almost no safety at all between 0.4375 and 0.5. 
If we _ever_ see a value between 0.4375 and 0.5, we must have more 
than 4 guard bits, and the logic of your argument is that we are 
probably OK if we see 0.4375's, but not 0.46875's.

We should probably find out how much safety we have - i.e. can we 
provoke a 0.46875, or even a 0.484375.

As for fixing the problem (when we are in a position to make a 
rational analysis of what constitutes a problem!) there would seem to 
be an automatic recovery method, as follows:

1) Unpack the work vector for the previous iteration & recover the 
true residual.
2) Generate a new work vector for the next FFT run length & pack the 
residual into it.
3) Run one iteration - i.e. re-run the iteration that caused the 
excess roundoff panic.
4) Unpack the work vector & recover the true residual after running 
this single iteration.
5) Generate a new work vector for the original FFT run length & pack 
the residual into it.
6) Continue as if nothing had happened.

This will keep the speed high except for a few isolated iterations. 
The argument here is that running the whole test with a larger FFT 
run length is computationally expensive; it's very likely to be 
unneccessary (unless the FFT run length breakpoint is grossly 
incorrect - in which case a run will contain a great many 
"recoveries"); finally, even if an excess rounding error does slip 
through & cause an incorrect result, double-checking with a different 
program will pick it up.

Prime95 is a bit different:

The extra 11 bits of mantissa in the Intel FPU registers cause the 
roundoff errors to be apparently much more smoothly distributed, so, 
even if we do see the odd 0.45..., we _might_ still be OK.

Prime95 working in its default mode only checks convolution error 
every 128 iterations, so there's _some_ chance that things do, in 
fact, slip through. The safeguard here is that repeating the run with 
a different offset (as in a double-check run) will give a different 
distribution of roundoff errors, and (if the calculation went wrong 
as a result of an excess rounding error) it will not go wrong at the 
same place again - so the final residual will be different.


Regards
Brian Beesley
_
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Mersenne: Re: Roundoff errors

2000-05-07 Thread EWMAYER

Richard Otter wrote:

{snip}
> M10068757 Roundoff warning on iteration 9803267 maxerr =  0.4375
> M10068757 is not prime. Res64: AEED24F91193EE6F. Program: E2.7z  

Your result is in all likelihood fine - I've done double-checks that
gave lots more roundoff errors of the 0.4375 variety and still gave
the correct result. Yes, 0.4375 is quite close to the fatal 0.5 level,
but the reason it's highly unlikely one of those 0.4375's was really
a 0.5625 which was aliased (by the error detection algorithm) to a 0.4375
is as follows:

a) large errors like the above are the result of large convolution
coefficients during the FFT-based sqauring. A large significant (non-error)
part of the convolution coefficient means that any accumulated rounding
errors will collect in the least-significant few bits of the floating-point
mantissa. That's why errors close to 0.5 tend to come in the form of
(integer)/(small power of 2).

b) Especially for large runlengths (and after the first few hundred iterations
or so), rounding errors tend to be randomly distributed in an approximately
Gaussian fashion, and the distribution is essentially identical from one
iteration to the next. That means that if one of those 0.4375's was really
a 0.5625 to a 0.4375, you'd very likely have seen a 0.5000 (which would have
caused the test of that exponent to be halted) at some point. Statistically,
for a run like yours where errors above 0.4 are few, an error of 0.5625 is
going to be rather rarer than a 0.5, i.e. you'd be quite unlikely to
get an error of 0.5625 (which would evade detection) but not see any 0.5000's
(which would not.)

> I am running Mlucas_2.7z.ev4 on a Compaq Alpha machine.

For you to see any roundoff errors for p ~ 10.07M seems unusual (the other
such cases I've seen are all at 10.08 or above), so I'm guessing that you're
running under DEC Unix 4.0.  In 4.0, there's a bug in the real*16 trig
library which Mlucas uses for sincos tables - for angular arguments
of the form theta = n*pi/2 +- pi/512, the sin and cos values can be
incorrect. I first encountered this 1 1/2 years ago and built in a
workaround for it - the code compares the real*16 values and real*8
and if the difference exceeds an error threshold, uses the latter.
This costs a small amount of accuracy, but when you're really close
to the default exponent limit that's all it takes.

Also, I suggest you upgrade to v2.7a at your earliest convenience;
this will allow the program to continue on to the next exponent in
the worktodo.ini file if the current run quits due to a fatal error
(see below). 2.7a also supports runlengths of 288K and 576K (whereas
2.7z jumps to 320K and 640K, respectively), which means less of a
timing jump when your exponents cross the 5.15M and 10.11M thresholds.

*   *   *   *

WHAT TO DO IF YOU GET A FATAL ERROR MESSAGE:

Several users have reported encountering fatal convolution errors
while testing exponents close to the default limits for FFT lengths 256K
and 512K. If this happens using v2.7z, the program will simply quit.
Using v2.7a, the program will simply halt the test of the current
exponent and move on to the next one in the worktodo.ini file.
In either case, the previous-checkpoint savefiles for the halted
exponent will still be there, and I have made available a higher-accuracy
(and slightly slower) version of the 2.7a code which should enable you to
safely finish these runs. Check out

ftp://209.133.33.182/pub/mayer/README.html

Sorry about any inconvenience this may cause - as more people use the
program, I'll be able to refine the runlength/exponent breakpoints based
on observed error behavior.

Cheers,
-Ernst

_
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers