Mersenne Digest V1 #609

Mersenne Digest Wed, 4 Aug 1999 16:33:27 -0700

Mersenne Digest       Wednesday, August 4 1999       Volume 01 : Number 609




----------------------------------------------------------------------

Date: Tue, 03 Aug 1999 12:57:28 -0400
From: Jud McCranie <[EMAIL PROTECTED]>
Subject: Mersenne: error 12029

This may have been answered before, but I got Primenet error 12029 when 
connecting manually.  What is this?

+----------------------------------------------+
| Jud "program first and think later" McCranie |
+----------------------------------------------+


_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Tue, 3 Aug 1999 11:36:29 -0700
From: "Joth Tupper" <[EMAIL PROTECTED]>
Subject: Re: Mersenne: error 12029

Same here, I think the messages are not getting through.  This is
intermittant so I think that any of the usual suspects could apply (server
down, server too busy, weather too nice....)

I think of this the way I thought of GPF's under Windows 3.0:  "Oh,
###%^&#!!"

Found this in the FAQ (hey, I did not think to look either until I saw your
question):

<<What does PrimeNet error 12029 mean?

Can't find PrimeNet server. If you are using PrimeNet for the first time
through a proxy server or firewall, check your web browser settings for the
proxy address and port number, and make sure the primenet.ini file settings
match this.  Most often the 'http://' part of the ProxyHost URL is omitted,
or the port # is wrong, or the wrong proxy server is named.

If the error still happens, there may some other issue preventing your
computer from connecting to PrimeNet.  E-mail us at [EMAIL PROTECTED]
about it.>>


Right now I am trying to figure out why one computer (just a P5MMX-166)
locks up every 5 minutes in one room and works fine (10.5 hours since last
boot running mprime95) on my test bench.  Not getting the primenet
connection to work reliably is really confusing.  (Is it real or is it
memo-wrecked?)

Joth


- ----- Original Message -----
From: Jud McCranie <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Tuesday, August 03, 1999 9:57 AM
Subject: Mersenne: error 12029


> This may have been answered before, but I got Primenet error 12029 when
> connecting manually.  What is this?
>
> +----------------------------------------------+
> | Jud "program first and think later" McCranie |
> +----------------------------------------------+
>
>
> _________________________________________________________________
> Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
> Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers
>

_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Tue, 3 Aug 1999 13:38:12 -0600
From: "Aaron Blosser" <[EMAIL PROTECTED]>
Subject: RE: Mersenne: error 12029

> Same here, I think the messages are not getting through.  This is
> intermittant so I think that any of the usual suspects could apply (server
> down, server too busy, weather too nice....)

The entropia.com site seemed to be down some earlier today.  Back up now,
for the time anyway.  I think that's what done caused it :-)

> Right now I am trying to figure out why one computer (just a P5MMX-166)
> locks up every 5 minutes in one room and works fine (10.5 hours since last
> boot running mprime95) on my test bench.  Not getting the primenet
> connection to work reliably is really confusing.  (Is it real or is it
> memo-wrecked?)

Believe it or not, I've had countless problems with that and, remarkably, it
ended up being the flourescent light fixtures near the desk.  It had me
going there, wondering why it works okay in the lab, but when it's on that
guy's desk, no go.

I've even seen problems with some other guy's computer in the next cube was
leaking so much RF that it must have been interfering or something.  All I
knew in that case was that moving the turning off the "suspicious" computer
made the other one more reliable.  Weird.

What's that Oriental art of placing things in a room or house to achieve
harmony?  Feng Shui or something like that? :-)

Aaron

_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Tue, 3 Aug 1999 14:01:47 -0700
From: "Joth Tupper" <[EMAIL PROTECTED]>
Subject: Re: Mersenne: error 12029

- ----- Original Message -----
From: Aaron Blosser <[EMAIL PROTECTED]>
To: Mersenne@Base. Com <[EMAIL PROTECTED]>
Sent: Tuesday, August 03, 1999 12:38 PM
Subject: RE: Mersenne: error 12029


> > Same here, I think the messages are not getting through.  This is
> > intermittant so I think that any of the usual suspects could apply
(server
> > down, server too busy, weather too nice....)
>
> The entropia.com site seemed to be down some earlier today.  Back up now,
> for the time anyway.  I think that's what done caused it :-)

Sounds good.

> > Right now I am trying to figure out why one computer (just a P5MMX-166)
> > locks up every 5 minutes in one room and works fine (10.5 hours since
last
> > boot running mprime95) on my test bench.  Not getting the primenet
> > connection to work reliably is really confusing.  (Is it real or is it
> > memo-wrecked?)
>
> Believe it or not, I've had countless problems with that and, remarkably,
it
> ended up being the flourescent light fixtures near the desk.  It had me
> going there, wondering why it works okay in the lab, but when it's on that
> guy's desk, no go.

Wild.  For me, the P5 is working when it is near my flourescent.  But point
taken --
the interference could be anything.  I have been thinking that it might be
either room
temperature or bad line filtering or a flaky power supply or boards
separating (linked to
room temp again).  All are possible, but the list is open-ended.

> I've even seen problems with some other guy's computer in the next cube
was
> leaking so much RF that it must have been interfering or something.  All I
> knew in that case was that moving the turning off the "suspicious"
computer
> made the other one more reliable.  Weird.

Maybe it is the monitor -- an old IBM 14" (or smaller), generally pretty
nice, but
never know.  The P5 parts and the system board in the nearby computer
where the hangs occur used to be in cases on top of each other in my office.
They
were my main desktop units for about a year.  The P5 had glitches turning
on, but once
it turned on the Video and HDD, all was generally well.  I think the video
and HDD not
turning on might be system board related, but the hanging is new.  Maybe
they want to
be closer together!  (Symbiotic processing?)  Could also be as prosaic as
some minor
corrosion that maybe I have futzed enough to chip.  Well, I guess I will
just have to run
it back to the "remote" location and see if it takes up its old habits.

My wife is developing a sincere distaste for my interior decorating sense:
Oh, you want
the case attached to the chassis?  But it does so nicely as an accent piece
either alongside
or upside-down on top...

> What's that Oriental art of placing things in a room or house to achieve
> harmony?  Feng Shui or something like that? :-)
>
> Aaron

Sounds good.  My old references describe feng-shui ("wind-water") as a
belief
that we can calculate the good or evil in things by examining the topology
of the land.

Sounds like a good beginning to microwave circuitry to me.

Thanks.  "Ah, sweet mystery of life..."

Joth

>
> _________________________________________________________________
> Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
> Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers
>

_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Wed, 4 Aug 1999 00:03:04 +2
From: "Oscar Fuentes" <[EMAIL PROTECTED]>
Subject: Mersenne: Multiple residues - enhancing double-checking

        Hello all.

        This idea is rather obvious, but I don't remember anybody 
mentioned it.

        The schema (exponent, residue) were good when a double 
check last a few days. But now, a lot of time can be saved on triple 
checks if the partial residue of every X iterations were logged. X 
could be a million or so. To avoid hard disk and network traffic 
bloating, take only 16 bits. The final residue would be as always, of 
course.

        When a discrepance is found, the double check stops (and 
starts another exponent) while other machine (with different 
sofware?) begins the triple check up the offending point, where we 
knows (hopefully) who made the mistake. If was the first check, the 
double check machine continues the LLTest were it stopped, using 
the intermediate files. If the double check machine is wrong, the 
triple check now turns to be a double check.

        Furthermore, it's reasonable to think that most errors on 
LLTests occurs early, due to buggy hardware. So, the average time-
saving would be more than 50% for faulty results.

        Saludos
                Oscar Fuentes

_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Tue, 3 Aug 1999 17:31:15 -0600 
From: "Blosser, Jeremy" <[EMAIL PROTECTED]>
Subject: RE: Mersenne: error 12029

> > > Right now I am trying to figure out why one computer 
> (just a P5MMX-166)
> > > locks up every 5 minutes in one room and works fine (10.5 
> hours since
> last
> > > boot running mprime95) on my test bench.  Not getting the primenet
> > > connection to work reliably is really confusing.  (Is it 
> real or is it
> > > memo-wrecked?)
> >
> > Believe it or not, I've had countless problems with that 
> and, remarkably,
> it
> > ended up being the flourescent light fixtures near the 
> desk.  It had me
> > going there, wondering why it works okay in the lab, but 
> when it's on that
> > guy's desk, no go.
> 
> Wild.  For me, the P5 is working when it is near my 
> flourescent.  But point
> taken --
> the interference could be anything.  I have been thinking 
> that it might be
> either room
> temperature or bad line filtering or a flaky power supply or boards
> separating (linked to
> room temp again).  All are possible, but the list is open-ended.
> 

Well, if its hot, I'd say room temp. Might want to check the fans etc. I had
a motherboard that refused to boot once, and it was because of a flaky power
supply, so thats always a possibility.

> 
> My wife is developing a sincere distaste for my interior 
> decorating sense:
> Oh, you want
> the case attached to the chassis?  But it does so nicely as 
> an accent piece
> either alongside
> or upside-down on top...
> 

Heh, hey Aaron, remember when I had that old 486 mounted on the wall in my
old "office"? The entire computer was mounted on the wall, power supply,
disk drives, etc. It was kinda funny, I think we did it cuz we were bored
one day. Oh well, that just reminded me of that...

> > What's that Oriental art of placing things in a room or 
> house to achieve
> > harmony?  Feng Shui or something like that? :-)
> >
> > Aaron
> 
> Sounds good.  My old references describe feng-shui ("wind-water") as a
> belief
> that we can calculate the good or evil in things by examining 
> the topology
> of the land.
> 
> Sounds like a good beginning to microwave circuitry to me.
> 
> Thanks.  "Ah, sweet mystery of life..."
> 
> Joth
> 

Yeh, Feng-Shui is the art of placement of furniture and such. Like having
your desk facing out across the building its in since it is a position of
"power" (you are overlooking the building that way). And other things like
hanging chimes in windows to ward off the bad spirits that get in thru the
windows, and the weird sand box thingies as well as the cool little
fountains and stuff.

>From an aesthetic sense, I like Feng Shui...
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Tue, 3 Aug 1999 23:32:31 -0600
From: "Aaron Blosser" <[EMAIL PROTECTED]>
Subject: RE: Mersenne: Multiple residues - enhancing double-checking

>       This idea is rather obvious, but I don't remember anybody
> mentioned it.

This idea is rather obvious, and no, I don't remember seeing it either.

>       The schema (exponent, residue) were good when a double
> check last a few days. But now, a lot of time can be saved on triple
> checks if the partial residue of every X iterations were logged. X
> could be a million or so. To avoid hard disk and network traffic
> bloating, take only 16 bits. The final residue would be as always, of
> course.
>
>       When a discrepance is found, the double check stops (and
> starts another exponent) while other machine (with different
> sofware?) begins the triple check up the offending point, where we
> knows (hopefully) who made the mistake. If was the first check, the
> double check machine continues the LLTest were it stopped, using
> the intermediate files. If the double check machine is wrong, the
> triple check now turns to be a double check.
>
>       Furthermore, it's reasonable to think that most errors on
> LLTests occurs early, due to buggy hardware. So, the average time-
> saving would be more than 50% for faulty results.

I think the idea has definite merit.  If an error does occur, it's equally
likely to happen at any step along the way, statistically.  Errors are every
bit as likely to happen on the very first iteration as they are during the
50% mark, or the 32.6% mark, or on the very last iteration.

Especially as the exponents get larger and larger, I see a *definite*
possibility to reduce double check times by having first time LL tests
report residues at certain "percentages" along the way.

Just for example, every 10% along the way, it'll send it's current residue
to the Primenet server.

During the double check of this exponent, it's residues along the way are
compared to the residues from the first time check (which could presumably
be "checked out" along with the exponent itself).

What happens when there's a mismatch?  Well, first of all you've saved
yourself, I suppose on average, 50% of the time needed to run a full test.
Sometimes you'll notice a mismatch in the first 10% of the iterations,
saving alot of time, sometime you might not notice until the very last
iteration, but you get the idea.

Now, the question is, what do you do when there is a mismatch?  I'd guess
that the current double-check be put on hold, reported back to Primenet, and
reassigned to a different person for a "triple check" (different person
ensures a different machine runs the 3rd test...maybe not necessary, but a
darn good idea IMO).  It will check up to the point of mismatch, see which
one (the 1st or 2nd) it agrees with (perhaps neither...quadruple check
time), then continue on.

If the 3rd check agrees with the 1st, then the 3rd machine should finish up
and make sure there were no more errors in the 1st check.  However, if the
3rd check agrees with the 2nd check instead, then both the 2nd and 3rd
checkers should finish all iterations and check for total agreement.

Maybe I'm overcomplicating things (definitely), but that's a rough guess as
to how it might work.

Obviously, this needs thinking, and Primenet would need to handle this sort
of stuff, along with the clients.

But just think how much time could be saved when both first and second tests
disagree.  If all goes well, the 1st and 2nd tests will match and it took no
longer than it takes now.  But in those sort of rare cases where a 3rd test
is needed, you've saved ALOT of time by finding the problem early on.

Like I said, as the exponents get larger, the payoff for doing this, in
terms of CPU time, will MORE than make up for the hassles of reorganizing
how Primenet and the clients work.

Agree?  Disagree?  Comments would be nice.

I like the idea, personally.  Good thinking Oscar.

Aaron

_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Wed, 04 Aug 1999 02:18:57 EDT
From: "Ethan M. O'Connor" <[EMAIL PROTECTED]>
Subject: Mersenne: Intermediate Residues and doublechecking

It seems to me that recording intermediate residues for
comparison during double checking is not going to save 
much time; although you'll save on average 50% of the 
doublechecking time for exponents that are mistested
(assuming even and uncorrelated distribution of the
point in checking that the error occurs), mistested
exponents are only a small fraction of the double 
checking work.. I forget the numbers being tossed around, 
but you'd only save 50% of (the error rate) of the 
checking time.

HOWEVER, there is a potential use for the intermediate
residues anyway... as larger and larger exponenets start
getting tested, and algorithms start having to be modified
to support the testing of those big exponents, we may
run into a situation where a widespread subset of 
machines with some obscure hardware bug test a range
of exponents wrongly... using intermediate residues during
double checking would halve the time that would
pass before the widespread error became obvious, and
would reduce the amount of wasted work performed.

- -Ethan O'Connor
 [EMAIL PROTECTED]
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Wed, 4 Aug 1999 03:26:48 -0400 (EDT)
From: Lucas Wiman  <[EMAIL PROTECTED]>
Subject: RE: Mersenne: Multiple residues - enhancing double-checking

> This idea is rather obvious, and no, I don't remember seeing it either.

This had been discussed earlier.  Brian and I talked about it for a little
while, he came up with the original idea.

> I think the idea has definite merit.  If an error does occur, it's equally
> likely to happen at any step along the way, statistically.  Errors are every
> bit as likely to happen on the very first iteration as they are during the
> 50% mark, or the 32.6% mark, or on the very last iteration.

True, but if the system is malfunctioning then the errors should start
early.

> Especially as the exponents get larger and larger, I see a *definite*
> possibility to reduce double check times by having first time LL tests
> report residues at certain "percentages" along the way.

Yeah.  The error rate should be proportional to the runtime which is increases
with the square of the exponent (ouch!).

> Just for example, every 10% along the way, it'll send it's current residue
> to the Primenet server.

I'm guessing that you mean a certain amount of the residue.  Sending in
10 2meg files for *each* exponent in the 20,000,000 range would get very
unwieldy, and inconvenient for people and primenet.

Of course, this would only help if we were running more one test for the
same exponent at the same time (otherwise, this would just be a pointless
way to do a triple check).  They would either have to be coordinated
(running at the same time, logistical knightmare), or (as Brian suggested)
have a "pool" of exponents running on one computer.  That is to say when
one computer finishes to X%, it reports its 64-bit residue to primenet, and
waits for the second computer working on the same LL test to do the same.
Until the other (slower) computer reports in, the (faster) computer works on
another exponent.

This would speed up the entire project, but it would slow down the individual
exponent, which would make people mad :(.

> I forget the numbers being tossed around,
> but you'd only save 50% of (the error rate) of the
> checking time.

As I pointed out above, the error rate should increase with the square of the
exponent (plus change).  This means that if 1% have errors at 7mil, 22% will
have errors at 30mil.

- -Lucas Wiman
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Wed, 4 Aug 1999 14:31:24 +2
From: "Oscar Fuentes" <[EMAIL PROTECTED]>
Subject: RE: Mersenne: Multiple residues - enhancing double-checking

From:                   Lucas Wiman  <[EMAIL PROTECTED]>
Subject:                RE: Mersenne: Multiple residues - enhancing double-checking
Date sent:              Wed, 4 Aug 1999 03:26:48 -0400 (EDT)


> True, but if the system is malfunctioning then the errors should start
> early.

        If the program is buggy, too. (v17 ghost is here)

> > Just for example, every 10% along the way, it'll send it's current residue
> > to the Primenet server.
> 
> I'm guessing that you mean a certain amount of the residue.  Sending in
> 10 2meg files for *each* exponent in the 20,000,000 range would get very
> unwieldy, and inconvenient for people and primenet.

        Aaron has said the same as me, using better grammar :-) . 
Obviously, only a few bytes are needed.

> Of course, this would only help if we were running more one test for the
> same exponent at the same time [...]

        The simultaneous checks are interesting only on the first 
stages of v19 or other "new" testers. When a complete double 
check will last 6 months in a new FFT size, while the author waits 
for a result, a lot of people will be using it. 

[...]
>  That is to say when
> one computer finishes to X%, it reports its 64-bit residue to primenet, and
> waits for the second computer working on the same LL test to do the same.
> Until the other (slower) computer reports in, the (faster) computer works on
> another exponent.

        Not at all. The first-time check goes its way, but reporting 
partial residues to coordinator / primenet from time to time. Later, 
often when first LLTest was finished long time ago, somebody 
receives:

Double-Check: 
M23780981,64,863FF87,678676AA,FF637BC,[...],CRC:9923FDA.

The partial residues corresponds to X iterations first, 2*X iterations 
second, etc. X is fixed for all participants. Or use percentages 
instead, as Aaron said. An absence of a residue between two 
colons means that is unknown. With a so long and sensitive data 
streams (200 bytes typical) the CRC is a must to detect accidental 
modifications of the Worktodo file.

        This schema makes possible simultaneous checking, 
though. But the start-stop mechanism you describe has little 
sense.

> This would speed up the entire project, but it would slow down the individual
> exponent, which would make people mad :(.

        My schema only would speed double checking. The first-
time LLTest process is untouched, except of the reporting of 
intermediate partial residues.


> As I pointed out above, the error rate should increase with the square of the
> exponent (plus change).  This means that if 1% have errors at 7mil, 22% will
> have errors at 30mil.

        That's the matter with the idea: an estimate of time saved 
(on double checking!) should be made. The authors of tester apps 
should estimate the early bug catching features of this schema, 
too. Then, decide if its worth or not.

        I'm doing double checking. I think about my computer 
running for 6 months when the error is on the second week. Argh!

        Saludos
                Oscar Fuentes.

P.D.: I promise to improve my English ;-)

_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Wed, 4 Aug 1999 07:18:02 -0600
From: "Aaron Blosser" <[EMAIL PROTECTED]>
Subject: RE: Mersenne: Multiple residues - enhancing double-checking

> This had been discussed earlier.  Brian and I talked about it for a little
> while, he came up with the original idea.

Doh!  Curse my memory! :-)

> > I think the idea has definite merit.  If an error does occur,
> it's equally
> > likely to happen at any step along the way, statistically.
> Errors are every
> > bit as likely to happen on the very first iteration as they are
> during the
> > 50% mark, or the 32.6% mark, or on the very last iteration.
>
> True, but if the system is malfunctioning then the errors should start
> early.

Even more reason why it makes sense.

> > Just for example, every 10% along the way, it'll send it's
> current residue
> > to the Primenet server.
>
> I'm guessing that you mean a certain amount of the residue.  Sending in
> 10 2meg files for *each* exponent in the 20,000,000 range would get very
> unwieldy, and inconvenient for people and primenet.

Just a partial residue, like the one sent at the end of the test.  Even
smaller ones, like a 32 bit instead of 64 bit residue seems like it would do
the job splendidly.

> > I forget the numbers being tossed around,
> > but you'd only save 50% of (the error rate) of the
> > checking time.
>
> As I pointed out above, the error rate should increase with the
> square of the
> exponent (plus change).  This means that if 1% have errors at
> 7mil, 22% will
> have errors at 30mil.

Frightening to think so.  Are you sure the error rate increases?  Errors
seem like they'd show up more as a result of faulty hardware, to my
thinking.  I'd imagine that if a certain machine ran through about 10 10M
exponent error free, it has a very high likelihood of running a single 20M
exponent error free.

_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Wed, 4 Aug 1999 09:32:56 -0400
From: "R. Kevin Moore" <[EMAIL PROTECTED]>
Subject: Mersenne: intermediate checks

Here are some ideas:

1) You don't need to mail back all the intermediate residues to see if
they are matching - you only need to send a checksum, which could be
as small as a few hundred bytes!
2) Users could elect how often to save the residue, by % or by
iteration #, depending on their free hard drive space.
3) Users would send an intermediate residue -if- a checksum mismatch
is found during a double check. (This only  adds one additional
mailing of a residual to the total net-traffic since work on that
iteration would halt on the double check machine until a
triple-checking of the exponent reaches the iteration in question.
Then the residue [if any] matches one of the other two, then the
testing or double-checking would continue - depending on which residue
matched which.
4) In the future, local groups who manage exponents with local servers
could be improved & optimized assuming local groups are connected with
a network, and the server has plenty of hard drive space.  The idea
would be for the local groups to self-coordinate the factoring, LL &
double-checking of exponents or groups of exponents, making use of
intermediate residues (which will be fraught with more & more errors
as exponents get larger).  The local server would assign work
according to speed & reliability, and would be able to pull a machine
off a job to expedite double or triple checking.  It would also be
able to take additional work on other exponents, and would know what
type of work can best be added into its work group based on it's total
speed & reliability.

I believe the total average savings (in my 3rd point) would be
substantial - especially for large exponents.  Note that this SLOWS
the completion of an exponent, since it requires a wait for an
assignment of a triple check to point X iteration of an exponent, and
for results to be received back, until it's known if double checking
(or the original LL test - or neither) is able to restart).  However -
the savings comes in efficiently using people's machines - machines
would continue to work on other assignments in the meantime, and there
is where the savings comes in.

The savings from my 4th point works into this idea - it would
eliminate the delay in assigning & starting the mid-point triple-check
(assuming the work group had a free / almost-free machine), so total
time for completed exponents wouldn't be hurt too much.

This process could help save work from mid-point gimp quitters, by
letting them at least send in their mid-point residue.  (Or is this
already done - I've never quit!)

Comments?

- -Kevin Moore

PS.  someone just sent out a note with a similar comment, so I'll
leave it at that.  :)

_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Wed, 4 Aug 1999 10:13:29 -0700
From: "Joth Tupper" <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Multiple residues - enhancing double-checking

It seems like there may be three (or more) kinds of problem:

- - round-off error gets too large and creates an error (and other unknown
software issues)
- - specific hardware failures producing errors
- - random occurrences producing errors

mprime95 and the numerous ports and alternatives in GIMPS (I am not trying
to be callous towards Mac or Unix/Linux users -- I am too ignorant to be
callous) seem well able to catch several intermittant bugs in software.  I
know that I am frequently (well, several times a month on one machine or
another) hit with SUMOUT errors.  mprime95 just picks up at an earlier point
and restarts.  Occasionally the sumout error repeats but it is rarely at the
same iteration.  Something goes wrong there, but I cannot tell precisely
what.  Often the problem seems to crop up when some other piece of software
is misbehaving -- perhaps there is some memory violation that Windows does
not trap (ah, gee, ya think?).

It seems likely that each of these events are pretty much independent.  Both
expected specific hardware failures and random occurrences seem proportional
to running time.  Each machine and environment would have its own
probabilities (hard to know in advance -- probably hard to know at all).
The possible software glitches seem to be proportional to iterations and as
several comments observe, the machine (or processor) specific failure rates
may increase with CPU speed.  Still, I thought I saw that mprime95
algorithms for Mp run in something on the order of  p^2 ( log p) steps.
When p doubles, the LL runtime should go up by a factor slightly higher than
4 (asymptotically 4, and less than 4.2 at p>8 million).

Stirring this melange together, if we quadruple p (to get from a lot of
current testing up to 33 million) and double processor/RAM speed (well, my
fastest machine is a P-II 266Mhz with a 66Mhz bus -- moving to 550/100 might
double the throughput with available parts), the runtime would increase by
about a factor of 8.  I would naively expect the failure rate to increase by
a factor between 8 and 16, although it could be higher or lower because
those pesky probabilities all change.
Also, if the probability of a failure in any step is r and there are n
steps, then the probability of a clean run is really 1-(1-r)^n which has
lead term nr (and the probability is less than nr, of course).  If we expect
a 1% failure rate at 8M and r stays fixed, then we expect about a 4% failure
at around 33M. (.99^4 is about .96 = 1 - .04).

I like the thread of saving multiple residues at various checkpoints along
the way.  George suggested a % completion series.  I might suggest a
specific series of points -- like every L(1000k).  This might be simpler to
track in a database although the number of entries grows linearly with p so
the data storage might grow with p^2, depending.  Another series like
k*floor(p/s) would work just as well and keep the data needs smaller as it
would have just s+1 checkpoints (s can be fixed for all p).  All of the
steps saved should be saved for both 1st and 2nd run, as George suggested.
There is no point to stopping a 2nd run at the first difference although
there may be great value in starting a 3rd run as soon as possible after the
2nd fails to match the first.  If the third run pops up different from both
the 1st and 2nd run, primenet should send someone a cry for help:  too many
mismatches suggest something strangely wrong.

Might the v.17 problem have been trapped with something like this?  I do not
recall enough of the discussion to know and the ensuing belly-aching
overshadowed the real content of finding/fixing/reworking.  (I know I am
never going to rise high on the list, so I do not worry a whole lot about
how much my report shows.)

One way of testing a new version would be by double checking current and
prior version data.  In fact, I would expect that the quality assurance
group plans to use double-checking as a post-beta test stage.  The data base
saves could let a lot of us help out on that last stage before a full
release.  I know I would be happy to let my double-checking machines do new
version testing.

Joth



- ----- Original Message -----
From: Aaron Blosser <[EMAIL PROTECTED]>
To: Mersenne@Base. Com <[EMAIL PROTECTED]>
Sent: Wednesday, August 04, 1999 6:18 AM
Subject: RE: Mersenne: Multiple residues - enhancing double-checking


> > This had been discussed earlier.  Brian and I talked about it for a
little
> > while, he came up with the original idea.
>
> Doh!  Curse my memory! :-)
>
> > > I think the idea has definite merit.  If an error does occur,
> > it's equally
> > > likely to happen at any step along the way, statistically.
> > Errors are every
> > > bit as likely to happen on the very first iteration as they are
> > during the
> > > 50% mark, or the 32.6% mark, or on the very last iteration.
> >
> > True, but if the system is malfunctioning then the errors should start
> > early.
>
> Even more reason why it makes sense.
>
> > > Just for example, every 10% along the way, it'll send it's
> > current residue
> > > to the Primenet server.
> >
> > I'm guessing that you mean a certain amount of the residue.  Sending in
> > 10 2meg files for *each* exponent in the 20,000,000 range would get very
> > unwieldy, and inconvenient for people and primenet.
>
> Just a partial residue, like the one sent at the end of the test.  Even
> smaller ones, like a 32 bit instead of 64 bit residue seems like it would
do
> the job splendidly.
>
> > > I forget the numbers being tossed around,
> > > but you'd only save 50% of (the error rate) of the
> > > checking time.
> >
> > As I pointed out above, the error rate should increase with the
> > square of the
> > exponent (plus change).  This means that if 1% have errors at
> > 7mil, 22% will
> > have errors at 30mil.
>
> Frightening to think so.  Are you sure the error rate increases?  Errors
> seem like they'd show up more as a result of faulty hardware, to my
> thinking.  I'd imagine that if a certain machine ran through about 10 10M
> exponent error free, it has a very high likelihood of running a single 20M
> exponent error free.
>
> _________________________________________________________________
> Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
> Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers
>

_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Wed, 4 Aug 1999 13:44:38 -0600
From: "Aaron Blosser" <[EMAIL PROTECTED]>
Subject: RE: Mersenne: intermediate checks

> Here are some ideas:
>
> 1) You don't need to mail back all the intermediate residues to see if
> they are matching - you only need to send a checksum, which could be
> as small as a few hundred bytes!

or just 4 bytes for a 32 bit residue.  Or why not 8 bytes for a nice,
"safe", 64 bit residue.

> 2) Users could elect how often to save the residue, by % or by
> iteration #, depending on their free hard drive space.

Better not let the user pick how often.  Better to just let the client
software "hardcode" at what points it saves residues, just to make sure
everyone is doing it the same way.

> This process could help save work from mid-point gimp quitters, by
> letting them at least send in their mid-point residue.  (Or is this
> already done - I've never quit!)

For someone to be able to pick up where someone else left off, it'd have to
send the whole intermediate file to some server.   Those files are getting
larger and larger...

_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Wed, 4 Aug 1999 15:01:39 -0600
From: "Aaron Blosser" <[EMAIL PROTECTED]>
Subject: RE: Mersenne: Multiple residues - enhancing double-checking

> I like the thread of saving multiple residues at various checkpoints along
> the way.  George suggested a % completion series.  I might suggest a
> specific series of points -- like every L(1000k).  This might be
> simpler to
> track in a database although the number of entries grows linearly
> with p so
> the data storage might grow with p^2, depending.  Another series like
> k*floor(p/s) would work just as well and keep the data needs smaller as it
> would have just s+1 checkpoints (s can be fixed for all p).

I think we could keep it simple by just saving every x% iteration's residue
(in truncated form).  Using a WAG of saving it every 10% along the way,
you'd only have 9 partial and 1 full residue when all is said and done.

So for an exponent like 8027219, you'd save the partial residue at the 10%,
or 802722th iteration (rounding up or down as normal).  Of course, the
number of iterations varies just slightly from the exponent, but
whatever...you get the idea.

These partial residues could be sent (for the first LL test) during the
check-ins, or saved up and all sent at once when the test is done.

> Might the v.17 problem have been trapped with something like
> this?  I do not
> recall enough of the discussion to know and the ensuing belly-aching
> overshadowed the real content of finding/fixing/reworking.  (I know I am
> never going to rise high on the list, so I do not worry a whole lot about
> how much my report shows.)

I don't think so, since it produced the wrong data right from the start,
regardless.  A double-check on a different platform (non GIMPS code like
MacLucas) would have caught it earlier though I suppose.


_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Wed, 4 Aug 1999 19:19:56 -0400 (EDT)
From: Lucas Wiman  <[EMAIL PROTECTED]>
Subject: RE: Mersenne: Multiple residues - enhancing double-checking

>>  That is to say when
>> one computer finishes to X%, it reports its 64-bit residue to primenet, and
>> waits for the second computer working on the same LL test to do the same.
>> Until the other (slower) computer reports in, the (faster) computer works on
>> another exponent.

>        Not at all. The first-time check goes its way, but reporting
>partial residues to coordinator / primenet from time to time. Later,
>often when first LLTest was finished long time ago, somebody
>receives:

>Double-Check:
>M23780981,64,863FF87,678676AA,FF637BC,[...],CRC:9923FDA.

This scheme makes almost no sense for normal double checking.  This is becuase
it would save *no* time at all.  Think about it, even if you identify that an
error ocurred in the second week of a 3-month test, you still have to run it
to completion, and a third test must also be run.  (So 3 LL tests must still
be run if an error ocurrs).

>        This schema makes possible simultaneous checking,
> though. But the start-stop mechanism you describe has little
> sense.

The method that you describe would only allow simultanious checking if
the computers were of equal speed, or if one kept working on the same
exponent, and the other computer kept getting further behind.  The scheme
that I described (and Brian thought up) would allow the computers to run
at the same exponent/time, while still keeping busy.  

Sorry about my bad english, and it's even my first language!

- -Lucas
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

End of Mersenne Digest V1 #609
******************************
Mersenne Digest V1 #609

Reply via email to