RE: Mersenne: Multiple residues - enhancing double-checking

1999-08-07 Thread Preda Mihailescu

> 
> Yes, they can and do.   In the field of computational number theory, several
> participants in the Cunningham project have been active much longer than my
> decade.  Sam Wagstaff has been co-ordinating it longer than that.  To be
> honest, I don't know when he started.  Perhaps Peter Mon tgomery (another
> long-time participant) can inform us, as I know he's also on the Mersenne
> list.
> 

I have been working on a PhD to prove primes for ten years of sundays, Paul :-)


Preda


> In other fields, it's not unusual to commit to a decade or more.  Consider
> the deep space missions such as Voyager, or the astronomers who monitor a
> particular class of objects for most of their working lives.  I know some
> amateurs who have been observing particular variable stars since the 1960's;
> George Alcock has been searching for comets and novae since the late
> fifties.
> 
> 
> Paul
> _
> Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
> Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers
_
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



RE: Mersenne: Multiple residues - enhancing double-checking

1999-08-07 Thread Peter-Lawrence . Montgomery

Paul Leyland (whom I'll meet for the second time later this month) writes:

> > From: Brian J. Beesley [mailto:[EMAIL PROTECTED]]
> 
> > A couple of points here:
> > (1) Can anyone honestly commit to a project for a whole decade?
 
 
> Yes, they can and do.   In the field of computational number theory, several
> participants in the Cunningham project have been active much longer than my
> decade.  Sam Wagstaff has been co-ordinating it longer than that.  To be
> honest, I don't know when he started.  Perhaps Peter Mon tgomery (another
> long-time participant) can inform us, as I know he's also on the Mersenne
> list.
 
> In other fields, it's not unusual to commit to a decade or more.  Consider
> the deep space missions such as Voyager, or the astronomers who monitor a
> particular class of objects for most of their working lives.  I know some
> amateurs who have been observing particular variable stars since the 1960's;
> George Alcock has been searching for comets and novae since the late
> fifties.

Ten years for one computation seems a long time.  Yes, I have been 
contributing to the Cunningham project for a long time, using machines
at multiple institutions, but we can hardly commit for ten years.  
Jobs change today, at least in the USA.
A home machine may be stolen or damaged by fire or floods.  
My family wouldn't know what to do if I died.

For the very long LL computations, it is reasonable for two sites
A and B to start the computation, and do a checkpoint perhaps 
every 1000 iterations.  When A reaches its 7000-th iteration 
(or other multiple of 1000), it sends the checkpoint file to a central site, 
and can continue for another 1000 iterations if B has submitted a
consistent checkpoint for the 6000-th iteration, with different 
programs used for iterations 5001-6000.
Otherwise A is blocked (i.e., told to do something else until B catches up).
If A and B differ at the 6000-th iteration checkpoint, 
or if one times out (say six months with no activity),
a third site can restart from the (agreed) 5000-th iteration checkpoint 

It is important that the different programs (e.g., Prime95,
MacLucas, LucasMayer) agree on a common checkpoint format.  Not only must the
central server be able to compare them, but a contributor might migrate from
a Pentium in 1996 to an Alpha today to a Merced in 2002.  

Of course this data at the central site will be voluminous,
if each exponent around 100 M averages 2.5 13 Mb checkpoint files
(for iterations 5000 and 6000 and possibly 7000 in the example).  
If there are 10 LL tests in progress, then 
the central site will need 3250 Gb.  A tetrabyte is huge today but
may be common in 15 years.  In the 1970's much data was on
seven-track tapes.  I recall each held 1.9 M 60-bit words (14 Mb)
Today the NFS data for M619 is being processed on a filesystem with 16 Gb,
a 1100-fold increase over 25 years.

Optimizations can reduce the size stored.  For example, store the full
5000-th iteration checkpoint, but only hash(6000-th iteration) and
hash(7000-th iteration) where the hash function might return only 1024
bits.  When the second site sends a 6000-th iteration checkpoint, 
verify that it gives the same hash value.  If yes, then we assume
B's full file agrees with A's, and store the 6000-th iteration checkpoint
while deleting the 5000-th.  If no, the full 5000-th remains available 
for a third site to use.

  Peter



_
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



RE: Mersenne: Multiple residues - enhancing double-checking

1999-08-07 Thread Paul Leyland

> From: Brian J. Beesley [mailto:[EMAIL PROTECTED]]

> A couple of points here:
> (1) Can anyone honestly commit to a project for a whole decade?


Yes, they can and do.   In the field of computational number theory, several
participants in the Cunningham project have been active much longer than my
decade.  Sam Wagstaff has been co-ordinating it longer than that.  To be
honest, I don't know when he started.  Perhaps Peter Mon tgomery (another
long-time participant) can inform us, as I know he's also on the Mersenne
list.

In other fields, it's not unusual to commit to a decade or more.  Consider
the deep space missions such as Voyager, or the astronomers who monitor a
particular class of objects for most of their working lives.  I know some
amateurs who have been observing particular variable stars since the 1960's;
George Alcock has been searching for comets and novae since the late
fifties.


Paul
_
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



RE: Mersenne: Multiple residues - enhancing double-checking

1999-08-06 Thread Brian J. Beesley

On 6 Aug 99, at 2:47, Lucas Wiman wrote:

> The main way to speed this up is to have system C continue from the last
> residue that it is known that system A and B agree upon.

The way I looked at it, the PrimeNet server only has to store _one_ 
(64 bit) residue for each exponent in progress. As others have 
pointed out, shifting files >> 10 MB in size across slow dial-up 
links isn't going to be popular, but you can't have C continue from 
the last residue agreed by A and B, unless the agreed residue is 
known to the PrimeNet server.

1 users by 16 MB per file = 160 GB of disk space - I think Scott 
might have to invest in a new disk array!

>  Or, possibly
> even better (though more error prone) would be to have A and B recalculate
> from last agreed upon residue, and see if they now agree.

Now, that's a possibility. But suppose they don't, again. It's 
possible that this could happen if one of the systems was sufferring 
from an esoteric hardware bug (possibly common to both systems, but 
they'd be running different offsets, so the FFT data would differ).

My "inefficient" scheme also allows reasonably quick recovery from 
one of the systems disconnecting from the project.
> 
> Won't this mean systems constantly getting interupted in the middle of an
> exponent segment? Or would the systems only continue on one segment when
> they finish the current one?

No - if I've exponent x suspended & I'm working on exponent y ( > x), 
then I just wait until I'm checking in y anyway, when I may get a 
message to continue x (or some other exponent(s)). If I don't, and y 
is now suspended too, I need another exponent to start work on.

If I find I have several exponents all of which are now OK to 
continue, I just pick the numerically lowest one to work on next.
> 
> This makes sense, though it might not be needed.  This scheme would work
> best on people willing to stay with GIMPS for the decade required to
> finish the range.  Many of these people have (fast) networks with lots of
> computers on them.  The check/double check computers might be right next
> to each other, owned, and run by the same person.

A couple of points here:
(1) Can anyone honestly commit to a project for a whole decade?
(2) I honestly think it's better if the two computers working on a 
particular exponent are _not_ owned/operated by the same person. This 
precaution removes the possibility of one deranged individual 
"contaminating" the project results by "forging" results.


Regards
Brian Beesley
_
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



RE: Mersenne: Multiple residues - enhancing double-checking

1999-08-05 Thread Lucas Wiman

> If B's checkpoint info doesn't match A's, the exponent is issued to
> another system C which calculates to the first checkpoint & check in,
> as before. If we now have two matching residuals, the two systems
> supplying them are told to continue and the system with the "wrong"
> result is told to abandon that exponent. If all three differ, issue
> the exponent to system D ...

The main way to speed this up is to have system C continue from the last
residue that it is known that system A and B agree upon.  Or, possibly
even better (though more error prone) would be to have A and B recalculate
from last agreed upon residue, and see if they now agree.  This would waste
at most 2 P90 months, without having to transfer it over the network, or have
system D doing half the agreed upon work.  I think it's safe to say that if
system A and B can agree on the residue at that point, then it is the correct
residue.

> As well as minimising wasted work, this scheme results in very little
> extra server traffic, since the systems all need to check in
> occasionally anyway. If Prime95 is altered so that it always works on
> the _smallest_ exponent it has "in hand", work should be completed
> reasonably quickly and in reasonable order. (Especially bearing in
> mind that this scheme completes double-checking at the same time as
> first tests!)

Won't this mean systems constantly getting interupted in the middle of
an exponent segment?
Or would the systems only continue on one segment when they finish the
current one?

> When a prime is found, the discovery credit (& any prize money)
> should obviously be split between the two users who ran the last
> iteration on the exponent in question.

This makes sense, though it might not be needed.  This scheme would work
best on people willing to stay with GIMPS for the decade required to finish
the range.  Many of these people have (fast) networks with lots of computers
on them.  The check/double check computers might be right next to each other,
owned, and run by the same person.

This will not matter for some time.  Does anyone have current data for
% mistakes?

-Lucas

_
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



RE: Mersenne: Multiple residues - enhancing double-checking

1999-08-05 Thread Brian J. Beesley

On 5 Aug 99, at 9:05, Eric Hahn wrote:

> Based on previous messages in this thread, I'm going
> to throw in my 2 cents...  (and avoid a lot of
> quoting!)

And I'm doing the same. I'd probably have reacted sooner if I hadn't 
been offline (getting a kidney stone fixed).

Several months ago I suggested a method which does require some work 
on both client and server, but which really does reduce wasted work 
to a minimum. The way it works is as follows:

Assign each exponent initially to two systems A and B. The software 
has a table of checkpoints built in (these can be very sparse for 
smaller exponents, becoming denser as the exponent increases, so as 
to maintain about 1 P90 month between checkpoints).

Suppose A reaches the first checkpoint before B. A stops working on 
the exponent & moves on to some other work already queued. It reports 
the iteration number and the low 64 bits of the residue to the 
PrimeNet server.

When B reaches the first checkpoint, it also stops working on the 
exponent & gets on with something else. It also reports the iteration 
number & the low 64 bits of the residue to the PrimeNet server. The 
PrimeNet server then checks to see if the checkpoint information 
matches that sent in by another system. If it does, it immediately 
sends a message to B instructing B that it can continue with the 
checked exponent up to the next checkpoint. It also tags a message to 
be sent to A instructing A to proceed with the checked exponent, 
which is communicated to A next time A checks into the server.

If B's checkpoint info doesn't match A's, the exponent is issued to 
another system C which calculates to the first checkpoint & check in, 
as before. If we now have two matching residuals, the two systems 
supplying them are told to continue and the system with the "wrong" 
result is told to abandon that exponent. If all three differ, issue 
the exponent to system D ...

The last iteration (completing the run) is treated in the same way as 
an intermediate checkpoint, except that there is no need for the 
PrimeNet server to send a "continue" or "abort" command - though this 
might provide useful feedback to the user, if it was done anyway.

As well as minimising wasted work, this scheme results in very little 
extra server traffic, since the systems all need to check in 
occasionally anyway. If Prime95 is altered so that it always works on 
the _smallest_ exponent it has "in hand", work should be completed 
reasonably quickly and in reasonable order. (Especially bearing in 
mind that this scheme completes double-checking at the same time as 
first tests!)

When a prime is found, the discovery credit (& any prize money) 
should obviously be split between the two users who ran the last 
iteration on the exponent in question.

> I think as the time increases for each LL test, there
> would be much more time savings in attempting to do
> higher trial-factoring.

Yes, trial factoring to a higher limit does help, but, unfortunately, 
as well as each extra bit of factoring depth costing much time as all 
the previous work, the chance of finding a factor (in a "window" of 
any particular fixed size) decreases steadily as the factoring depth 
increases. There is also going to be a jump in factoring time 
somewhere around 62/63/64 bits due to hardware limitations of the 
Intel architecture.

Don't forget, factoring time is O(2^d/p) whereas LL testing time is 
"only" O(p^2 * log p), where p is the exponent and d is the factoring 
depth in bits.

Finally, although save files for exponents > 33 million are going to 
be a considerable size (14 MBytes for exponents just big enough to 
have 10 million decimal digits), it might be worth considering an 
_option_ where users (with fast network connections) could deposit 
save files for "safe keeping" (as a backup). As has been pointed out 
elsewhere, this would save time in the event that a user goes AWOL. 

Clearly it isn't reasonable to expect users with dial-in connectivity 
to transmit files > 10MBytes long with any frequency, but, for users 
with high-speed connections, it doesn't represent much of a problem.
The other issue is finding enough disk space to store a large number 
of these files ... they don't compress very well, either!

Regards
Brian Beesley
_
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Re: Mersenne: Multiple residues - enhancing double-checking

1999-08-05 Thread Mikus Grinbergs

I agree with everything that Lucas said.

For the project to benefit, the participants would have to AGREE
to restructure the work.  Savings would result from stopping the
diverging work earlier.  (And if "passing around" intermediate
files was accepted, by starting any triple-check work later.)

In essence, the idea is to take that 1_year_to_complete LL test of
an exponent and BREAK IT DOWN into (n) discrete serial work units.

mikus


Proposed procedural steps:

(b) When 'Tester 1' requests work, assign ONLY work unit (1) to him.
Howewver, the project RESERVES exponent () to 'Tester 1' -
only *he* will (in the future) be assigned initial testing of
further work units within exponent ().

(c) Eventually, Tester 1 finishes that work unit, and begins to
work on some other work unit on some other exponent.  HOWEVER,
'Tester 1' saves the intermediate file (of how far he got with
exponent ).

Somewhere along the line, exponent () work unit (1) becomes
available to be assigned to a double-checker.

(d) Now, when 'Checker 2' requests work, assign exponent () work
unit (1) to him for double-checking.

Since this is work unit (1), there is as yet no intermediate
file for exponent ().  If, however, this was a subsequent
work unit, and Checker 2 was *not* the person who performed
the preceding work unit's test/check, then Checker 2 might
OBTAIN the intermediate file from whoever did that preceding
work unit.

(e) When Checker 2 finishes work unit (1) of exponent (), his
(work unit "final") residue is compared to the residue reported
for work unit (1) by Tester 1.  HOWEVER, 'Checker 2' saves the
intermediate file (of how far he got with exponent ).

(f) If the residues match, exponent () work unit (2) becomes
available to be assigned to 'Tester 1'.

Now, 'Tester 1' and 'Checker X' repeat steps (b) - (e), until
the residues of the FINAL work unit on exponent () agree.


If the residues of the most recently tested AND checked work
unit of exponent () do not match, then that work unit
becomes available to be assigned to a triple-checker.

(g) Now, when 'Checker 3' requests work, assign the diverging work
unit within exponent () to him for re-checking.

Checker 3 might OBTAIN from Tester 1 the intermediate file for
exponent () as of two work units back.  Checker 3 will
perform the first of those two work units (i.e., the one
previous to the divergence) to verify that his (work unit
"final") residue matches the (known good) one from Tester 1.

(h) Checker 3 will then go on to triple-check the diverging work
unit.  When Checker 3 finishes that work unit of exponent
(), his (work unit "final") residue is compared to the
residue reported by Tester 1.

(i) If the residues match, the next work unit of exponent ()
becomes available to be assigned to 'Tester 1', and testing
and checking continue the same as with a successful step (f).


If Checker 3's (work unit "final") residue does not match
Tester 1's, but does match Checker 2's, then exponent ()
is un-reserved from 'Tester 1', and is instead RESERVED to
'Checker 2'.  Testing and checking of the work units of ()
continue with step (i), where 'Checker 2' replaces 'Tester 1'.


(k) If Checker 3's (work unit "final") residue on the diverging
work unit of exponent () matches neither Tester 1's nor
Checker 2's, maybe the best thing to do is to start all over
with a new set of participants.


Comment:  If the breaking down of a long LL computation into discrete
  units of work is adopted, thought might be given to having
  prime95 generate a "universal interchange" intermediate
  file at the end of a unit of work - in other words, special
  output that would be acceptable as input across a number of
  operating systems and/or hardware chip types.


Note: For the sake of "fairness", becoming the 'reserved owner'
  on a 10-unit exponent might require 10 'credits' earned
  through doing checking.




In article <[EMAIL PROTECTED]>,
Lucas Wiman  <[EMAIL PROTECTED]> wrote:
> > This idea is rather obvious, and no, I don't remember seeing it either.
>
> This had been discussed earlier.  Brian and I talked about it for a little
> while, he came up with the original idea.
>
> > I think the idea has definite merit.  If an error does occur, it's equally
> > likely to happen at any step along the way, statistically.  Errors are every
> > bit as likely to happen on the very first iteration as they are during the
> > 50% mark, or the 32.6% mark, or on the very last iteration.
>
> True, but if the system is malfunctioning then the errors should start
> early.
>
> > Especially as the exponents get larger and larger, I see a *definite*
> > possibility to reduce double check t

Re: Mersenne: Multiple residues - enhancing double-checking, test factoring

1999-08-05 Thread Joth Tupper


- Original Message -
From: Eric Hahn <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Thursday, August 05, 1999 9:05 AM
Subject: RE: Mersenne: Multiple residues - enhancing double-checking


>
> The only thing this scheme would accomplish is to
> allow a third test to begin (if a mismatch is found
> somewhere along the way) before the second one
> finishes.  If the third test did match the first,
> there would be savings in discarding the second.

This thought may sound awfully like poaching, but here goes anyway...

Some folks want to do first time LL tests.  Some folks ask for best suited
tasks -- I do.

If I have a machine doing double-checks I really do not know or care if it
happens to start the double-check
the same day as the first time LL test starts.  Primenet can track  if I am
the first LL or the check LL.
If the check finishes before the first LL on a prime Mp, George will need to
set the rules.  Perhaps both
parties get co-discoverer status rather than discoverer status.  This could
make concurrent double-checking
a lot more attractive.  Concurrent double checking might speed up some of
the activities, especially finding the
need for triple checks.

>Current trial-factoring saves 12.5% of the total time
>that would be spent without any trial-factoring.
>
>If we were to trail-factor to say, 2^70 (which should
>cause factor time to increase to 1 month), without any
>additional improvement to exponents being factored this
>will cause a drop to about 8.8% overall time saved.

Another thought:  does the factor test already use Goldbach's conjecture?

Someone with more cleverness and time than I have right now might think
about using this fact:

Given p,q integers and r = p+q.  Then Mr = Mp.Mq + Mp + Mq

This is just:  Mp.Mq = (2^p - 1)(2^q - 1) = 2^(p+q) - 2^p - 2^q + 1 = Mr -
Mp - Mq

So what, right?  Suppose we want to factor test Mr.  We can (by trial and
error) find
primes p and q with r = p+q.  Usually there are lots.  (Hey, if we don't
then we just found
a counterexample to Goldbach's conjecture.  That would be big news, too.)

Mr, Mp and Mq are relatively prime in pairs, so none of the factors of Mp or
Mq divide Mr.

Now, if F is a test factor in the range, suppose we keep a database of  Fp =
Mp mod F.  This is big,
but once we have r=p+q, we just calculate Fp.Fq + Fp +Fq mod F.  If 0, then
F divides Mr.

One practical question is how bloody big is this collection of Fp's?

Could this make test factoring run faster?

Joth





















_
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



RE: Mersenne: Multiple residues - enhancing double-checking

1999-08-05 Thread Eric Hahn

Based on previous messages in this thread, I'm going
to throw in my 2 cents...  (and avoid a lot of
quoting!)

While the reporting of partial residues at specific
points, say every 10% (or 5% over 20M), isn't a bad
idea, it won't necessarily save a whole lot of time.

The only thing this scheme would accomplish is to
allow a third test to begin (if a mismatch is found
somewhere along the way) before the second one
finishes.  If the third test did match the first,
there would be savings in discarding the second.
What if, however, the third test matched the second?
Both tests would have to complete, and some safeguard
would have to be a place should the third test complete
before the second.  

This will increase the logistics a lot (but not make
it impossible).  The client would be required to report
most of the information to the server, without receiving
much from the server (a stop, go, or discard maybe).
A checksum figure as part of the work file, as somebody
suggested (shown below), wouldn't work well if the latter
case above occurred (2nd and 3rd test match).

>Double-Check=
>M23780981,64,863FF87,678676AA,FF637BC,[...],CRC:9923FDA.

Much more than the logistics, however, would be the
confusion facter.  This would likely increase
exponentially, especially when attempting to determine
the current overall status of the project.  We might
even have to have George committed eventually, when
he starts running his fingers up and down across his
lips or rubs the side of his head to the bone with his
fingers or palm of his hand... :-(

I think as the time increases for each LL test, there
would be much more time savings in attempting to do
higher trial-factoring.

Let's look at some figures using the following:
 1) We assume to be using a PII 233MHz.
 2) an LL test takes a year (around 19.3M)
(AND HEY! This is 4 to 5 years away!!)
 3) current trial-factoring to 2^64 takes 2 days if
no factor is found.
 4) currently, only about 13% of exponents are actually
factored, with the rest requiring an LL test.
 5) 2 LL tests are done for each exponent (an
original and a double-check) 

Current trial-factoring saves 12.5% of the total time
that would be spent without any trial-factoring.

If we were to trail-factor to say, 2^70 (which should
cause factor time to increase to 1 month), without any
additional improvement to exponents being factored this
will cause a drop to about 8.8% overall time saved.

However, for each % improvement in exponents being
factored, overall time saved would increase by a little
over 1%.  As a result, if we were to improve to 20% of
the exponents being factored, we would save close to 16%
of the overall time spent without any factoring.

Of course, even more time would be saved by factoring,
if an exponent that would have been factored had it been
trial-factored to 2^70, was to be determined to need a
triple-check during LL testing because it wasn't factored
during a trail-factor process to 2^64

In conclusion, even with a new algorithm that halves
the time for an LL test, 1 month to factor an exponent
still beats a year to perform an LL test and its
corresponding double-check. In addition, it should be
easier to speed up the factoring process than the LL
testing process.


_
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



RE: Mersenne: Multiple residues - enhancing double-checking

1999-08-05 Thread Oscar Fuentes

From:   Lucas Wiman  <[EMAIL PROTECTED]>
Subject:        RE: Mersenne: Multiple residues - enhancing double-checking
To: [EMAIL PROTECTED]
Date sent:  Wed, 4 Aug 1999 19:19:56 -0400 (EDT)

> 
> >Double-Check:
> >M23780981,64,863FF87,678676AA,FF637BC,[...],CRC:9923FDA.
> 
> This scheme makes almost no sense for normal double checking.  This is becuase
> it would save *no* time at all.  Think about it, even if you identify that an
> error ocurred in the second week of a 3-month test, 

The errors can occur with the same probability during de 
double-check. So, when a mismatch is found, the double-check 
stops this exponent and a triple check is started on another 
computer (hopefully with another program). The triple check 
decides who is wrong. If the double check were wrong you saved a 
partial LLTest. If the first check is wrong, the double check 
continues just when it stopped, this time as a first time check. In 
any case, the triple check turns to be a double check.

Check it ! :-)

After all, maybe the major advantage of all this is the 
ability to catch bugs early, when some people volunteers to run 
simultaneous LLTests with different programs. As time passes, the 
coordinator can compare results quickly. Note that some bugs can 
be very subtle. Nothing replaces the testing on the field. I would 
like to know how George notified the v17 bug.

> you still have to run it
> to completion, and a third test must also be run.  (So 3 LL tests must still
> be run if an error ocurrs).

Correct.
 
> >This schema makes possible simultaneous checking,
> > though. But the start-stop mechanism you describe has little
> > sense.

Sorry, I mean "has little sense given the individualism of 
certain GIMPS members". A lot of people says "It's _my_ 
exponent. Dare to work on it before I finish !" Practical or not, I 
respect this attitude.


Saludos
Oscar Fuentes
_
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



RE: Mersenne: Multiple residues - enhancing double-checking

1999-08-05 Thread Ken Kriesel

At 03:01 PM 1999/08/04 -0600, "Aaron Blosser" <[EMAIL PROTECTED]> wrote:
>I think we could keep it simple by just saving every x% iteration's residue
>(in truncated form).  Using a WAG of saving it every 10% along the way,
>you'd only have 9 partial and 1 full residue when all is said and done.
>
>So for an exponent like 8027219, you'd save the partial residue at the 10%,
>or 802722th iteration (rounding up or down as normal).  Of course, the
>number of iterations varies just slightly from the exponent, but
>whatever...you get the idea.
>
>These partial residues could be sent (for the first LL test) during the
>check-ins, or saved up and all sent at once when the test is done.

The 2^n least significant bits of the residue make a good check value, 
since once the modulo operation begins to have an effect, the lsb's 
quickly change.

It is essential to have agreement among different programs, on exactly 
which iterations' residue LSB's are to be saved.  This encompasses both
the save interval in iterations, and agreeing on whether the starting
value of 4 is iteration 0, 1, 2, etc., as well as agreeing on the starting
value, which can be other values besides 4 according to the math.  
It is more efficient but not necessary to have agreement on how many bits 
to save.  Even n=4, 2 bytes, gives a fairly good error detection rate, but
n=6, 8 bytes is much better!

I suggest that the number of iterations between residue saves be made
a function of the FFT length, since those scale with the runtime for
a given cpu.  As we move to exponents that require months even on fast
machines, we will see an increase in the rate of erroneous results per
exponent on the same hardware and same small error rate per execution hour.
We will eventually reach exponents with runtimes long enough that the
system hardware reliability drops significantly before an exponent
completes, due to hardware aging!

My feeling is that a P-90-month is about the right interval at which
to kick out intermediate residues of 64-bits or so.

In a case where two runs are taken to completion, for a large exponent,
such that dozens of interim residues are recorded for the first LLtest
and for the second, the third tie-breaker need only be run to the point
where the first two diverge and the third agrees with one but not the
other.  On average this will save about half the third-test, which now
we need about 1% of the time, but in a few years we may need much more 
often.  By shortening the third-test duration before we can make a comparison,
it halves the probability of the third-test being in error and so lessens
the amount of time spent on fourth-tests.  This saving will also become
more significant at higher exponents.

Total runtimes are going up drastically.  When I joined GIMPS in mid 1996,
runtimes per exponent were 12 hours (572k on a P-120).  Soon we will have 
the capability to run exponents requiring more than 12 months on PIII-500's.

>> Might the v.17 problem have been trapped with something like
>> this?  I do not
>> recall enough of the discussion to know and the ensuing belly-aching
>> overshadowed the real content of finding/fixing/reworking.  (I know I am
>> never going to rise high on the list, so I do not worry a whole lot about
>> how much my report shows.)

There is another trick for dealing with the v17 shift bug, which I've 
suggested to George Woltman.  Before the modulo kicks in, the rightmost 
2n+2 bits should follow a set pattern that is calculatable for n< log(2)p
or so.  Math wizards, please be kind re errors in what follows.

In hexadecimal, the first several iterations for any value P of current
interest follows a predictable pattern.  The pattern persists until
the result of an iteration grows larger than Mp, and so is independent
of P for a time.
L(n+1)=Ln^2-2; (using the convention here that we iterate from 2 to n)
n  hex Ln   binary Ln
2  4
3  E  
4 C2  1100 0010
5   93021001 0011  0010
6  5468 4C020101 0100 0110 1000 0100 1100  0010
7  1BD6 96D9 F03D 3002 ... 1101 0011   0010

least significant 64-bits only below:
8  8cc8 8407 a9f4 c002
9  5559 9f9d 37d3 0002
10 f460 d65d df4c 0002
11 e180 d807 7d30 0002
12 9bdb 491d f4c0 0002
13 4ceb b477 d300 0002

etc for all n I found to check, where Mp> Ln, & Ln<=2^(2^(n-1)); 
Mp>2^(2^(n-1))
2^p>2^(2^(n-1))+1
p>2^(n-1)
log(2)p+1>n
 log(2)33,219,281=24.985>n
 log(2)5,000,000=22.25>n
(In practice I've found the limits at low n occur at slightly lower p
than expected.)

That is, the rightmost 2(n+1) bits begin with binary 0011, then in the middle
there are 2(n-2) zero bits, then a 1 and another 0.

For exponents 16,777,215http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



RE: Mersenne: Multiple residues - enhancing double-checking

1999-08-04 Thread Lucas Wiman

>>  That is to say when
>> one computer finishes to X%, it reports its 64-bit residue to primenet, and
>> waits for the second computer working on the same LL test to do the same.
>> Until the other (slower) computer reports in, the (faster) computer works on
>> another exponent.

>Not at all. The first-time check goes its way, but reporting
>partial residues to coordinator / primenet from time to time. Later,
>often when first LLTest was finished long time ago, somebody
>receives:

>Double-Check:
>M23780981,64,863FF87,678676AA,FF637BC,[...],CRC:9923FDA.

This scheme makes almost no sense for normal double checking.  This is becuase
it would save *no* time at all.  Think about it, even if you identify that an
error ocurred in the second week of a 3-month test, you still have to run it
to completion, and a third test must also be run.  (So 3 LL tests must still
be run if an error ocurrs).

>This schema makes possible simultaneous checking,
> though. But the start-stop mechanism you describe has little
> sense.

The method that you describe would only allow simultanious checking if
the computers were of equal speed, or if one kept working on the same
exponent, and the other computer kept getting further behind.  The scheme
that I described (and Brian thought up) would allow the computers to run
at the same exponent/time, while still keeping busy.  

Sorry about my bad english, and it's even my first language!

-Lucas
_
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



RE: Mersenne: Multiple residues - enhancing double-checking

1999-08-04 Thread Aaron Blosser

> I like the thread of saving multiple residues at various checkpoints along
> the way.  George suggested a % completion series.  I might suggest a
> specific series of points -- like every L(1000k).  This might be
> simpler to
> track in a database although the number of entries grows linearly
> with p so
> the data storage might grow with p^2, depending.  Another series like
> k*floor(p/s) would work just as well and keep the data needs smaller as it
> would have just s+1 checkpoints (s can be fixed for all p).

I think we could keep it simple by just saving every x% iteration's residue
(in truncated form).  Using a WAG of saving it every 10% along the way,
you'd only have 9 partial and 1 full residue when all is said and done.

So for an exponent like 8027219, you'd save the partial residue at the 10%,
or 802722th iteration (rounding up or down as normal).  Of course, the
number of iterations varies just slightly from the exponent, but
whatever...you get the idea.

These partial residues could be sent (for the first LL test) during the
check-ins, or saved up and all sent at once when the test is done.

> Might the v.17 problem have been trapped with something like
> this?  I do not
> recall enough of the discussion to know and the ensuing belly-aching
> overshadowed the real content of finding/fixing/reworking.  (I know I am
> never going to rise high on the list, so I do not worry a whole lot about
> how much my report shows.)

I don't think so, since it produced the wrong data right from the start,
regardless.  A double-check on a different platform (non GIMPS code like
MacLucas) would have caught it earlier though I suppose.


_
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Re: Mersenne: Multiple residues - enhancing double-checking

1999-08-04 Thread Joth Tupper

It seems like there may be three (or more) kinds of problem:

- round-off error gets too large and creates an error (and other unknown
software issues)
- specific hardware failures producing errors
- random occurrences producing errors

mprime95 and the numerous ports and alternatives in GIMPS (I am not trying
to be callous towards Mac or Unix/Linux users -- I am too ignorant to be
callous) seem well able to catch several intermittant bugs in software.  I
know that I am frequently (well, several times a month on one machine or
another) hit with SUMOUT errors.  mprime95 just picks up at an earlier point
and restarts.  Occasionally the sumout error repeats but it is rarely at the
same iteration.  Something goes wrong there, but I cannot tell precisely
what.  Often the problem seems to crop up when some other piece of software
is misbehaving -- perhaps there is some memory violation that Windows does
not trap (ah, gee, ya think?).

It seems likely that each of these events are pretty much independent.  Both
expected specific hardware failures and random occurrences seem proportional
to running time.  Each machine and environment would have its own
probabilities (hard to know in advance -- probably hard to know at all).
The possible software glitches seem to be proportional to iterations and as
several comments observe, the machine (or processor) specific failure rates
may increase with CPU speed.  Still, I thought I saw that mprime95
algorithms for Mp run in something on the order of  p^2 ( log p) steps.
When p doubles, the LL runtime should go up by a factor slightly higher than
4 (asymptotically 4, and less than 4.2 at p>8 million).

Stirring this melange together, if we quadruple p (to get from a lot of
current testing up to 33 million) and double processor/RAM speed (well, my
fastest machine is a P-II 266Mhz with a 66Mhz bus -- moving to 550/100 might
double the throughput with available parts), the runtime would increase by
about a factor of 8.  I would naively expect the failure rate to increase by
a factor between 8 and 16, although it could be higher or lower because
those pesky probabilities all change.
Also, if the probability of a failure in any step is r and there are n
steps, then the probability of a clean run is really 1-(1-r)^n which has
lead term nr (and the probability is less than nr, of course).  If we expect
a 1% failure rate at 8M and r stays fixed, then we expect about a 4% failure
at around 33M. (.99^4 is about .96 = 1 - .04).

I like the thread of saving multiple residues at various checkpoints along
the way.  George suggested a % completion series.  I might suggest a
specific series of points -- like every L(1000k).  This might be simpler to
track in a database although the number of entries grows linearly with p so
the data storage might grow with p^2, depending.  Another series like
k*floor(p/s) would work just as well and keep the data needs smaller as it
would have just s+1 checkpoints (s can be fixed for all p).  All of the
steps saved should be saved for both 1st and 2nd run, as George suggested.
There is no point to stopping a 2nd run at the first difference although
there may be great value in starting a 3rd run as soon as possible after the
2nd fails to match the first.  If the third run pops up different from both
the 1st and 2nd run, primenet should send someone a cry for help:  too many
mismatches suggest something strangely wrong.

Might the v.17 problem have been trapped with something like this?  I do not
recall enough of the discussion to know and the ensuing belly-aching
overshadowed the real content of finding/fixing/reworking.  (I know I am
never going to rise high on the list, so I do not worry a whole lot about
how much my report shows.)

One way of testing a new version would be by double checking current and
prior version data.  In fact, I would expect that the quality assurance
group plans to use double-checking as a post-beta test stage.  The data base
saves could let a lot of us help out on that last stage before a full
release.  I know I would be happy to let my double-checking machines do new
version testing.

Joth



- Original Message -
From: Aaron Blosser <[EMAIL PROTECTED]>
To: Mersenne@Base. Com <[EMAIL PROTECTED]>
Sent: Wednesday, August 04, 1999 6:18 AM
Subject: RE: Mersenne: Multiple residues - enhancing double-checking


> > This had been discussed earlier.  Brian and I talked about it for a
little
> > while, he came up with the original idea.
>
> Doh!  Curse my memory! :-)
>
> > > I think the idea has definite merit.  If an error does occur,
> > it's equally
> > > likely to happen at any step along the way, statistically.
> > Errors are every
> > > bit as likely to happen on the very first iteration as they are
> > during the
> > > 50% mark, or the 32.6% mark, or on the very last iteration.
> >
> > True, but if the system is malfu

RE: Mersenne: Multiple residues - enhancing double-checking

1999-08-04 Thread Aaron Blosser

> This had been discussed earlier.  Brian and I talked about it for a little
> while, he came up with the original idea.

Doh!  Curse my memory! :-)

> > I think the idea has definite merit.  If an error does occur,
> it's equally
> > likely to happen at any step along the way, statistically.
> Errors are every
> > bit as likely to happen on the very first iteration as they are
> during the
> > 50% mark, or the 32.6% mark, or on the very last iteration.
>
> True, but if the system is malfunctioning then the errors should start
> early.

Even more reason why it makes sense.

> > Just for example, every 10% along the way, it'll send it's
> current residue
> > to the Primenet server.
>
> I'm guessing that you mean a certain amount of the residue.  Sending in
> 10 2meg files for *each* exponent in the 20,000,000 range would get very
> unwieldy, and inconvenient for people and primenet.

Just a partial residue, like the one sent at the end of the test.  Even
smaller ones, like a 32 bit instead of 64 bit residue seems like it would do
the job splendidly.

> > I forget the numbers being tossed around,
> > but you'd only save 50% of (the error rate) of the
> > checking time.
>
> As I pointed out above, the error rate should increase with the
> square of the
> exponent (plus change).  This means that if 1% have errors at
> 7mil, 22% will
> have errors at 30mil.

Frightening to think so.  Are you sure the error rate increases?  Errors
seem like they'd show up more as a result of faulty hardware, to my
thinking.  I'd imagine that if a certain machine ran through about 10 10M
exponent error free, it has a very high likelihood of running a single 20M
exponent error free.

_
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



RE: Mersenne: Multiple residues - enhancing double-checking

1999-08-04 Thread Oscar Fuentes

From:   Lucas Wiman  <[EMAIL PROTECTED]>
Subject:        RE: Mersenne: Multiple residues - enhancing double-checking
Date sent:  Wed, 4 Aug 1999 03:26:48 -0400 (EDT)


> True, but if the system is malfunctioning then the errors should start
> early.

If the program is buggy, too. (v17 ghost is here)

> > Just for example, every 10% along the way, it'll send it's current residue
> > to the Primenet server.
> 
> I'm guessing that you mean a certain amount of the residue.  Sending in
> 10 2meg files for *each* exponent in the 20,000,000 range would get very
> unwieldy, and inconvenient for people and primenet.

Aaron has said the same as me, using better grammar :-) . 
Obviously, only a few bytes are needed.

> Of course, this would only help if we were running more one test for the
> same exponent at the same time [...]

The simultaneous checks are interesting only on the first 
stages of v19 or other "new" testers. When a complete double 
check will last 6 months in a new FFT size, while the author waits 
for a result, a lot of people will be using it. 

[...]
>  That is to say when
> one computer finishes to X%, it reports its 64-bit residue to primenet, and
> waits for the second computer working on the same LL test to do the same.
> Until the other (slower) computer reports in, the (faster) computer works on
> another exponent.

Not at all. The first-time check goes its way, but reporting 
partial residues to coordinator / primenet from time to time. Later, 
often when first LLTest was finished long time ago, somebody 
receives:

Double-Check: 
M23780981,64,863FF87,678676AA,FF637BC,[...],CRC:9923FDA.

The partial residues corresponds to X iterations first, 2*X iterations 
second, etc. X is fixed for all participants. Or use percentages 
instead, as Aaron said. An absence of a residue between two 
colons means that is unknown. With a so long and sensitive data 
streams (200 bytes typical) the CRC is a must to detect accidental 
modifications of the Worktodo file.

This schema makes possible simultaneous checking, 
though. But the start-stop mechanism you describe has little 
sense.

> This would speed up the entire project, but it would slow down the individual
> exponent, which would make people mad :(.

My schema only would speed double checking. The first-
time LLTest process is untouched, except of the reporting of 
intermediate partial residues.


> As I pointed out above, the error rate should increase with the square of the
> exponent (plus change).  This means that if 1% have errors at 7mil, 22% will
> have errors at 30mil.

That's the matter with the idea: an estimate of time saved 
(on double checking!) should be made. The authors of tester apps 
should estimate the early bug catching features of this schema, 
too. Then, decide if its worth or not.

I'm doing double checking. I think about my computer 
running for 6 months when the error is on the second week. Argh!

Saludos
Oscar Fuentes.

P.D.: I promise to improve my English ;-)

_
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



RE: Mersenne: Multiple residues - enhancing double-checking

1999-08-04 Thread Lucas Wiman

> This idea is rather obvious, and no, I don't remember seeing it either.

This had been discussed earlier.  Brian and I talked about it for a little
while, he came up with the original idea.

> I think the idea has definite merit.  If an error does occur, it's equally
> likely to happen at any step along the way, statistically.  Errors are every
> bit as likely to happen on the very first iteration as they are during the
> 50% mark, or the 32.6% mark, or on the very last iteration.

True, but if the system is malfunctioning then the errors should start
early.

> Especially as the exponents get larger and larger, I see a *definite*
> possibility to reduce double check times by having first time LL tests
> report residues at certain "percentages" along the way.

Yeah.  The error rate should be proportional to the runtime which is increases
with the square of the exponent (ouch!).

> Just for example, every 10% along the way, it'll send it's current residue
> to the Primenet server.

I'm guessing that you mean a certain amount of the residue.  Sending in
10 2meg files for *each* exponent in the 20,000,000 range would get very
unwieldy, and inconvenient for people and primenet.

Of course, this would only help if we were running more one test for the
same exponent at the same time (otherwise, this would just be a pointless
way to do a triple check).  They would either have to be coordinated
(running at the same time, logistical knightmare), or (as Brian suggested)
have a "pool" of exponents running on one computer.  That is to say when
one computer finishes to X%, it reports its 64-bit residue to primenet, and
waits for the second computer working on the same LL test to do the same.
Until the other (slower) computer reports in, the (faster) computer works on
another exponent.

This would speed up the entire project, but it would slow down the individual
exponent, which would make people mad :(.

> I forget the numbers being tossed around,
> but you'd only save 50% of (the error rate) of the
> checking time.

As I pointed out above, the error rate should increase with the square of the
exponent (plus change).  This means that if 1% have errors at 7mil, 22% will
have errors at 30mil.

-Lucas Wiman
_
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



RE: Mersenne: Multiple residues - enhancing double-checking

1999-08-03 Thread Aaron Blosser

>   This idea is rather obvious, but I don't remember anybody
> mentioned it.

This idea is rather obvious, and no, I don't remember seeing it either.

>   The schema (exponent, residue) were good when a double
> check last a few days. But now, a lot of time can be saved on triple
> checks if the partial residue of every X iterations were logged. X
> could be a million or so. To avoid hard disk and network traffic
> bloating, take only 16 bits. The final residue would be as always, of
> course.
>
>   When a discrepance is found, the double check stops (and
> starts another exponent) while other machine (with different
> sofware?) begins the triple check up the offending point, where we
> knows (hopefully) who made the mistake. If was the first check, the
> double check machine continues the LLTest were it stopped, using
> the intermediate files. If the double check machine is wrong, the
> triple check now turns to be a double check.
>
>   Furthermore, it's reasonable to think that most errors on
> LLTests occurs early, due to buggy hardware. So, the average time-
> saving would be more than 50% for faulty results.

I think the idea has definite merit.  If an error does occur, it's equally
likely to happen at any step along the way, statistically.  Errors are every
bit as likely to happen on the very first iteration as they are during the
50% mark, or the 32.6% mark, or on the very last iteration.

Especially as the exponents get larger and larger, I see a *definite*
possibility to reduce double check times by having first time LL tests
report residues at certain "percentages" along the way.

Just for example, every 10% along the way, it'll send it's current residue
to the Primenet server.

During the double check of this exponent, it's residues along the way are
compared to the residues from the first time check (which could presumably
be "checked out" along with the exponent itself).

What happens when there's a mismatch?  Well, first of all you've saved
yourself, I suppose on average, 50% of the time needed to run a full test.
Sometimes you'll notice a mismatch in the first 10% of the iterations,
saving alot of time, sometime you might not notice until the very last
iteration, but you get the idea.

Now, the question is, what do you do when there is a mismatch?  I'd guess
that the current double-check be put on hold, reported back to Primenet, and
reassigned to a different person for a "triple check" (different person
ensures a different machine runs the 3rd test...maybe not necessary, but a
darn good idea IMO).  It will check up to the point of mismatch, see which
one (the 1st or 2nd) it agrees with (perhaps neither...quadruple check
time), then continue on.

If the 3rd check agrees with the 1st, then the 3rd machine should finish up
and make sure there were no more errors in the 1st check.  However, if the
3rd check agrees with the 2nd check instead, then both the 2nd and 3rd
checkers should finish all iterations and check for total agreement.

Maybe I'm overcomplicating things (definitely), but that's a rough guess as
to how it might work.

Obviously, this needs thinking, and Primenet would need to handle this sort
of stuff, along with the clients.

But just think how much time could be saved when both first and second tests
disagree.  If all goes well, the 1st and 2nd tests will match and it took no
longer than it takes now.  But in those sort of rare cases where a 3rd test
is needed, you've saved ALOT of time by finding the problem early on.

Like I said, as the exponents get larger, the payoff for doing this, in
terms of CPU time, will MORE than make up for the hassles of reorganizing
how Primenet and the clients work.

Agree?  Disagree?  Comments would be nice.

I like the idea, personally.  Good thinking Oscar.

Aaron

_
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers