QA (was re: Mersenne: Purpose of the self-test; also, aren't P4s fast!)

Ken Kriesel Tue, 15 May 2001 22:16:55 -0700
At 10:23 PM 5/15/2001 -0000, "Brian J. Beesley" <[EMAIL PROTECTED]> wrote:
>On 14 May 2001, at 20:52, George Woltman wrote:
>> 
>> >Is the self-test in fact just to check
>> >that there's not something in the CPU which goes glitchy when running
>> >flat-out SSE2 code for hours on end?
>> 
>> Yes.  The QA suite that Ken Kriesel and Brian Beesley worked on does a
>> better job at testing edge conditions.  Of course, they'll need to
update that
>> suite using the new limits.

For the time being I would like to continue focussing our QA efforts on the
general case, rather than P4's specific limits, since

1) there are relatively few P4's

2) we haven't finished QA testing in all run lengths with V19 / V20
(sign up now for a limited number of large exponents still available!)

3) the V19/20 QA now ongoing establishes "gold" residues over a huge range
of exponents, against which other programs on other architectures
can also be tested.

....

>Actually most of the exponents in the test suite were chosen to 
>exercise code in a manner which is particularly hard on the "magic 
>numbers" involved in the collapsed DWT. 

Above is one of the methods of exponent selection.  As George Woltman
stated it: "Exponents that are close to a multiple of the FFT length."

Other selection methods used to select full-LLtest exponents for QA were:

1) Already double-checked good numbers (or triple- or higher checked).  
Rerun to verify we can reproduce known-good results of previous versions 
or other code.

2) Duplicate the primality result for all known mersenne primes.
(A special case of the above.)

3) The uppermost and lowest prime exponents within the determined limits
for a given runlength.  Always test limit cases.  A little more than George
suggested: "Those near the end of each range should be checked for
excessive convolution error."

4) The prime exponent on either side of each integer power of two within the 
range supported by the program

5) Randomly selected prime exponents within the determined limits
for a given runlength; typically one per higher runlength.  The purpose of
this
set is to perhaps smoke out some case which we did not imagine.

6) We also tested a few composite exponents, just in case they behaved
differently.

7) LLtests were except for #6, mostly done on exponents that had already
run the gauntlet of trial factoring, p-1, and sometimes ecm.


>Perhaps it would help us if George could indicate the approximate 
>limits for each run length when the SSE2 code is in use.

Definitely.

>But could I just point out that there is a _potential_ benefit in 
>running selftests using ridiculously small exponents for the run 
>length being tested. Normally the maximum permitted roundoff error is 
>0.4; this means that a roundoff error will only be detected as such 
>on one in every five occasions on which it occurs. If we use a small 
>exponent then we could reduce the roundoff error limit to 0.1 (or 
>maybe even less) and therefore detect a much larger proportion of any 
>roundoff errors which might occur. The fact that the residual checked 
>at the end of each self-test may still be correct does not prove that 
>a hardware glitch has not occurred, though gross errors will of 
>course cause the selftest to fail for this reason.

If I recall correctly, this does not require "ridiculously" small exponents,
since the convolution error falls off quickly away from our usual upper
limit on exponent for a runlength.  Convolution error varies with 
run length, exponent, and shift count.

Running a case where we know the expected answer with high certainty:
exponent, runlength, # of iterations, shiftcount? entered as qadata.txt
1279,40,1277,0,xxxxxxxxxxxxxxxx
1279,48,1277,0,xxxxxxxxxxxxxxxx
1279,56,1277,0,xxxxxxxxxxxxxxxx
1279,64,1277,0,xxxxxxxxxxxxxxxx
output respectively
Exp/iters: 1279/1277, res: 7653615CCA7AB4C0, maxerr: 4.000000, maxdiff:
131072.000000000/529374.103214235
Exp/iters: 1279/1277, res: 5C21BA6CB463E665, maxerr: 0.500000, maxdiff:
128.000000000/393.066381654
Exp/iters: 1279/1277, res: 0000000000000000, maxerr: 0.157410, maxdiff:
1.000000000/2.342300785
Exp/iters: 1279/1277, res: 0000000000000000, maxerr: 0.001817, maxdiff:
0.007812500/0.051145460

>Date: Fri, 07 May 1999 01:48:12 -0500
>To: George Woltman <[EMAIL PROTECTED]>
>From: Ken Kriesel <[EMAIL PROTECTED]>
>Subject: Re: more shift count variation qa results of 4423
>
>At 09:07 AM 1999/05/06 -0400, you wrote:
>>Hi Ken,
>>
>>At 12:29 AM 5/6/99 -0500, you wrote:
>>>The cutoff at 4365 seems a little conservative; 4423 seems quite happy with
>>>shift counts
>>>that push it to maxerr 0.48. 
>>
>>To me that indicates you could well run into a shift count that creates
>>a maxerr > 0.50.
>
>You are correct; I did a baby perl script to generate big qa files, ran 4423 
>through every shift count 0-999 and got:
>980 of 0 residue
>20 each of unique wrong residues.  So the 2% probability of false negative
is 
>not so good.  Of course the program never states a maxerr>0.50; it "reflects"
>the values x> 0.50 to (1-x).  I think I'll try a few neighboring exponents
>to get an idea of the slope of the probability of bad residues.
>
>M4423 run 4421 iterations with fftl 192 and shift counts 0-999
>res 000000000000000 count 980
>res 04E0C944C91F9CE count 1
>res 1B92B6C64D8C0EB count 1
>res 2489897571D4776 count 1
>res 325D0B5486FF2AA count 1
>res 3502DA8AC1D68FD count 1
>res 3B5CF6D36D0FA3E count 1
>res 3FF0055EE85448B count 1
>res 4988524F03A6A11 count 1
>res 5D11B174FA0B038 count 1
>res 69A39F63A28D9D1 count 1
>res 6A26978370C9A17 count 1
>res 9E6C0BDDC8FFBC6 count 1
>res B2385807C567EA0 count 1
>res BC35F15EF7A5CF8 count 1
>res C2C21EA69FD21DF count 1
>res C8880763EF75BCA count 1
>res CFC67D001F69E45 count 1
>res DFC640B046F7FF7 count 1
>res EA4B46F31453796 count 1
>res FC75C83945A6029 count 1
>
>M4373 run 4371 iterations with fftl 192 and shift counts 1-999
>res EF677D079BEB768 count 999
>
>M4391 run 4389 iterations with fftl 192 and shift counts 0-999 attempted:
>res 723CFC572EAB034 count 836
>sumout !=sumin stopped the show at this point.


_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers
QA (was re: Mersenne: Purpose of the self-test; also, aren't P4s fast!)

Reply via email to