> >Yeah, but most of those are "silent mutations" in nonzero residues, not
> >errors in the purported primality result, right?
>
> Right. The odds heavily favor both mismatched residues being nonzero.
> A zero residue at the last iteration is what indicates primality.
Depends on what you mean. If a Mersenne number that has been
tested once really is prime, but the test "went wrong", then we
have a wrong result i.e. calling the number composite when it isn't.
This is why double-checking is so important, if we want to find *all*
the Mersenne primes for exponents up to a given limit.
There *might* be one or two lurking somewhere in the mass of
exponents which have only been tested once.
>
> >Also what causes the errors, bugs in the code?
>
> What I've seen most often is that prime95 and its relatives provide
> early warning of unreliable hardware, whether cpu, RAM module, or motherboard.
Usually caused by overheating - failed CPU fan, poor ventilation or
excessively hot environment, overclocking, poor thermal contact
between processor substrate and heatsink ...
>
> >Is work being done on
> >finding subtle errors in the software?
>
[... snip ...]
> I've volunteered to run on Intel, one or two exponents in each run length
> to try out the code before the bulk of the GIMPS effort is routinely being
> assigned exponents in the higher run lengths. Perhaps someone running a
> different architecture would be willing to double check them.
Ken, could you tell me which exponents you've run, which require
checking, & I'll double-check using MacLucasUNIX on a 533 MHz
Alpha system (approx. same speed as PII-300 running Prime95)
I would have thought one exponent per run length would be enough,
provided our results agree. My Alpha system has ECC memory
etc. so should be reasonably reliable.
[... snip ...]
> >Are some of them, in the case of
> >Prime95, caused by Winblows?
>
> Iteration: 1407235/5070277, ERROR: ILLEGAL SUMOUT
> Possible hardware failure, consult the readme file.
> Continuing from last save file.
>
> Possibly these are software, according to George's readme.txt, under the
> section Possible Hardware Failure
> If it is software, it is not necessarily the fault of Windows or Microsoft.
> Could be a bum driver not doing things it should.
There's a distinct *possibility* that *any* software running under
Win 9x could directly alter values in Prime95's workspace, since
Win 9x applications have access to the whole physical address
space. In Win NT (and linux), only kernel mode tasks can do this,
so the likelihood of memory being clobbered by a rogue application
is a great deal less.
>
> >(IOW, are there more errors per P90 CPU hour
> >among Winblows boxes than among mprime boxes?)
Don't know. It may be possible to get this info from George's
database, but it would take a fair bit of digging out.
> >I figure only a small percentage of participants have actual faulty
> >hardware, and that spurious cosmic ray bit flips are caught by checksumming
> >of some kind.
I think you'll find it's surprising how many systems become a great
deal more reliable if (a) the PSU is adjusted so that the supply rails
are accurate (+/- 5% errors are common), (b) the cooling is
improved (even turning down the room thermostat by a couple of
degrees can make a difference), (c) the system is clocked just a
few percent slower (especially if it has been overclocked to start
with).
Lots of users put up with flaky hardware; they get used to Windoze
locking up once in a while & just blame Bill Gates. He's not
*always* the guilty party.
Also, in my fairly extensive experience, systems that have been
well-handled from a electrostatic point of view tend to be reliable,
whereas those where people have changed memory etc. without
observing anti-static precautions tend to be flaky.
Regards
Brian Beesley
________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm