Re: Mersenne: P-1 on PIII or P4?

2003-03-11 Thread George Woltman
At 07:47 AM 3/11/2003 -0500, Richard Woods wrote:
However, any difference in FFT size between a P4 and other CPU, because
of SSE support/nonsupport, could make a difference to the algorithm
because it _does_ take FFT size into account.
There was a bug in calculating the the FFT size (bytes of memory consumed)
for the P4.   This bug caused the P-1 bounds selecting code to produce
different results than the x86 code.  This is a fairly benign bug and will be
fixed in version 23.3
In case you care, the details are:  There is a global variable called FFTLEN
that is used in many places and is initialized by the FFT init routine.  The
routine to select the P-1 bounds is called before the FFT code is initialized.
Thus, the routine to calculate the number of bytes consumed by an FFT
cannot use the global variable FFTLEN.   In fact, that routine is passed
an argument - fftlen in lower case.   Well, you guessed it, in the P4 section
of the routine I referenced FFTLEN rather than fftlen.   The routine worked
fine once the FFT code was initialized - only the P-1 bounds selecting code
was affected.
BTW, the FFT size is more than FFT length * sizeof (double).  There are
various paddings thrown in for better cache usage.  Sadly, if I had just
used FFT length * sizeof (double) as an estimate for the size in selecting
the P-1 bounds this bug never would have happened and the size estimate
is more than accurate enough for the purposes of selecting bounds.

---

Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.459 / Virus Database: 258 - Release Date: 2/25/2003


Re: Mersenne: P-1 on PIII or P4?

2003-03-10 Thread Nick Glover
At 13:05:41, Monday, 3/10/03, Brian J. Beesley wrote:
 I just tried Test=8907359,64,0 on two systems - an Athlon XP 1700+ and a
 P4-2533, both running mprime v23.2 with 384 MB memory configured (out of 512 
 MB total in the system). These were fresh installations, I did nothing apart 
 from adding SelfTest448Passed=1 to local.ini to save running the selftest.

 The Athlon system picked B1=105000, B2=1995000 whilst the P4 picked 
 B1=105000, B2=2126250. So it seems that P4 is picking a significantly but not 
 grossly higher B2 value.

 Yes, I checked, both systems are using 448K run length for this exponent 
 (though it's only just under the P4 crossover).

Maybe the P-1 bounds calculation accounts for the slightly slower than
normal iteration time that 8907359 would have on a P4 because of the roundoff
checking (since it is very close to the P4 512K FFT limit).

--

Nick Glover
[EMAIL PROTECTED]

It's good to be open-minded, but not so open that your brains fall out. - Jacob 
Needleman

_
Unsubscribe  list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers


Re: Mersenne: P-1 on PIII or P4?

2003-03-10 Thread Chris Marble
Daran wrote:
 
 I'd appreciate it if you
 or someone else could try starting a P-1 on the same exponent (not in one of
 the ranges where it would get a different FFT length) on two different
 machines, with the same memory allowed.

P4:
M8769809 completed P-1, B1=45000, B2=72, E=12, WY2: E2F4FF67
Memory allowed: 896MB (Machine has 1GB)

PIII:
M8769809 completed P-1, B1=45000, B2=72, E=12, WY2: E2F4FF67
Memory allowed: 990MB (Machine has 1 1/8GB)
-- 
  [EMAIL PROTECTED] - HMC UNIX Systems Manager
_
Unsubscribe  list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers


Re: Mersenne: P-1 on PIII or P4?

2003-03-10 Thread Daran
On Mon, Mar 10, 2003 at 09:05:41PM +, Brian J. Beesley wrote:

 On Monday 10 March 2003 07:49, Daran wrote:

 I just tried Test=8907359,64,0 on two systems - an Athlon XP 1700+ and a 
 P4-2533, both running mprime v23.2 with 384 MB memory configured (out of 512 
 MB total in the system). These were fresh installations, I did nothing apart 
 from adding SelfTest448Passed=1 to local.ini to save running the selftest.
 
 The Athlon system picked B1=105000, B2=1995000 whilst the P4 picked 
 B1=105000, B2=2126250. So it seems that P4 is picking a significantly but not 
 grossly higher B2 value.

My Duron 800 picks values identical to your Athlon with 384MB allowed.

No change at 400MB

At 420MB B2 goes up to 2021250, still lower than your B2 value.

At 504MB B2 remains at 2021250.

I don't think George's '1 or 2 extra temporaries' theory stands up.

 Regards
 Brian Beesley

Daran G.
_
Unsubscribe  list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers


Re: Mersenne: P-1 on PIII or P4?

2003-03-10 Thread George Woltman
At 01:16 AM 3/11/2003 +, Daran wrote:
I don't think George's '1 or 2 extra temporaries' theory stands up.
Sure it does.  I fired up the debugger and the P4 has 5541 temporaries
and the x86 has 89 temporaries.
Hmmm, maybe I'd better look into it a little bit further

---

Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.459 / Virus Database: 258 - Release Date: 2/25/2003


Re: Mersenne: P-1 on PIII or P4?

2003-03-09 Thread Daran
On Fri, Mar 07, 2003 at 09:51:33PM -0800, Chris Marble wrote:

 Daran wrote:
  
  On Thu, Mar 06, 2003 at 08:12:31PM -0800, Chris Marble wrote:
  
   Daran wrote:

 I like my stats but I could certainly devote 1 machine out of 20 to this.

If you're going to use one machine to feed the others, then it won't harm
your stats at all.  Quite the contrary.

 Assume I've got 1GB of RAM.  Do the higher B2s mean I should use a P4 rather
 than a P3 for this task?

I don't know, because I don't know why it gets a higher B2.

B1 and B2 are supposed to be chosen by the client so that the cost/benefit
ratio is optimal.  Does this mean that P4s is choose B2 values which are too
high?  Or does everything else choose values too low?  Or is there some
reason I can't think of, why higher values might be appropriate for a P4?

In fact, I'm not even sure it does get a higher B2 - the apparent difference
could be, as Brian suggested, due to differences between versions.  I don't
have access to a P4, so I can do any testing, But I'd appreciate it if you
or someone else could try starting a P-1 on the same exponent (not in one of
the ranges where it would get a different FFT length) on two different
machines, with the same memory allowed.  You would not need to complete the
runs.  You could abort the tests as soon as they've reported their chosen
limits.

 Would I unreserve all the exponents that are already P-1 complete?
 If I don't change the DoubleCheck into Pfactor then couldn't I just let
 the exponent run and then sometime after P-1 is done move the entry and
 the 2 tmp files over to another machine to finish it off?

If you're going to feed your other machines from this one, then obviously
you won't need to unreserve the exponents they need.  But there's an easier
way to do this.  Put SequentialWorkToDo=0 in prime.ini, then, so long as it
never runs out of P-1 work to do, it will never start a first-time or
doublecheck LL, and there will be no temporary files to move.  I also
suggest putting SkipTrialFactoring=1 in prime.ini.

 That sounds like more work than I care to do...

I agree that with 20 boxes, the work would be onerous.

 ...I can see having 1 machine
 do P-1 on lots of double-checks.

That would be well worth it.  Since one box will *easily* feed the other
twenty or so, you will have to decide whether to unreserve the exponents you
P-1 beyond your needs, or occasionally let that box test (or start testing)
one.

You may find a better match between your rate of production of P-1 complete
exponents, and your rate of consumption, if you do first-time testing.

[...]

 As an mprime user I edit the local.ini file all the time.  Per your notes
 I upped *Memory to 466.

That will certainly help exponents below 9071000 on a P3, or 8908000 on a P4. 
The current DC level is now over 917, so I doubt this will help much,
(though of course, it won't harm, either).  I haven't tried.  I'm still
getting enough sub 9071000 expiries.

 -- 
   [EMAIL PROTECTED] - HMC UNIX Systems Manager

Daran G.
_
Unsubscribe  list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers


Re: Mersenne: P-1 on PIII or P4?

2003-03-07 Thread Daran
On Thu, Mar 06, 2003 at 08:12:31PM -0800, Chris Marble wrote:

 Daran wrote:
  
  Whichever machine you choose for P-1, always give it absolutely as much
  memory as you can without thrashing.  There is an upper limit to how much it
  will use, but this is probably in the gigabytes for exponents in even the
  current DC range.
 
 So I should use the PIII with 1 3/4 GB of RAM to do nothing but P-1.

This depends upon what it is you want to maximise.  If it's your effective
contribution to the project, then yes.  Absolutely!  This is what I do on my
Duron 800 with a 'mere' 1/2GB.  The idea is that the deep and efficient P-1
you do replaces the probably much less effective effort that the final
recipient of the exponent would otherwise have made (or not made, in the
case of a few very old clients that might still be running).  I've not done
any testing, but I'm pretty sure that it would be worthwhile to put any
machine with more than about 250MB available to exclusive P-1 use.

On the other hand, you will do your ranking in the producer tables no
favours if you go down this route.

 It's an
 older Xeon with 2MB cache.  Will that help too?

You'll have to ask George if there is a codepath optimised for this
processor.  But whether there is or there isn't, should affect only the
absolute speed, not the trade-off between P-1 and LL testing.

I can't see how a 2MB cache can do any harm, though.

 How would I do this?  I see the following in undoc.txt:
 
 You can do P-1 factoring by adding lines to worktodo.ini:
 Pfactor=exponent,how_far_factored,has_been_LL_tested_once
 For example, Pfactor=1157,64,0

Unfortunately, pure P-1 work is not supported by the Primenet server, so
requires a lot of ongoing administration by the user.  First you need to
decide which range of exponents is optimal for your system(s).  (I'll
discuss this below).  Then you need to obtain *a lot* of exponents in that
range to test.  I do roughly eighty P-1s on DC exponents in the ten days or
so it would take me to do a single LL.

The easiest way to get your exponents is probably to email George and tell
him what you want.  Alternatively, if the server is currently making
assignments in your desired range, then you could obtain them by setting
'Always have this many days of work queued up' to 90 days - manual
communication to get some exponents - cut and past from worktodo.ini to a
worktodo.saved file - manual communication to get some more - cut and past -
etc.  This is what I do.

The result of this will be a worktodo.saved file with a lot of entries that
look like this

DoubleCheck=8744819,64,0
DoubleCheck=8774009,64,1
...

(or 'Test=...' etc.)  Now copy some of these back to your worktodo.ini file,
delete every entry ending in a 1 (These ones are already P-1 complete),
change 'DoubleCheck=' or 'Test=' into 'Pfactor=', and change the 0 at the
end to a 1 if the assignment was a 'DoubleCheck'.

When you next contact the server, any completed work will be reported, but
the assignments will not be unreserved, unless you act to make this happen. 
The easiest way to do this is to set 'Always have this many days of work
queued up' to 1 day, and copy your completed exponents from your
worktodo.saved file back to your worktodo.ini (not forgetting any that were
complete when you got them).  You do not need to unreserve exponents
obtained directly from George.

Like I said, It's *a lot* of user administration.  It's not nearly as
complicated as it sounds, once you get into the routine, but it's definitely
not something you can set up, then forget about.

If you're willing to do all this, then there's another optimisation you
might consider.  Since it's only stage 2 that requires the memory, you could
devote your best machine(s) to this task, using your other boxes to feed
them by doing stage 1.  This is assuming that they're networked together. 
Moving multimegabyte date files via Floppy Disk Transfer Protocol is not
recommended.

[...]

  If you are testing an exponent which is greater than an entry in the fifth
  column, but less than the corresponding entry int the third column, then
  avoid using a P4.  This applies to all types of work.

Actually it's worse than this.  The limits are soft, so if you are testing
an exponent *slightly* less than an entry in column 5, or *slightly*
greater than one in column 3, then you should avoid a P4.


Choice of exponent range


Stage two's memory requirements are not continuous.  This remark is probably
best illustrated with an example: on my system, when stage 2-ing an exponent
in the range 777 through 9071000, the program uses 448MB.  If that much
memory isn't available, then it uses 241MB.  If that's out of range, then
the next level down is 199MB, and so on.  There are certainly usage levels
higher than I can give it.

The benefits of using the higher memory levels are threefold.

1.  The algorithm runs faster.
2.  The program responds by deepening the search, 

Re: Mersenne: P-1 on PIII or P4?

2003-03-07 Thread Chris Marble
Daran wrote:
 
 On Thu, Mar 06, 2003 at 08:12:31PM -0800, Chris Marble wrote:
 
  Daran wrote:
   
   Whichever machine you choose for P-1, always give it absolutely as much
   memory as you can without thrashing.  There is an upper limit to how much it
   will use, but this is probably in the gigabytes for exponents in even the
   current DC range.
  
  So I should use the PIII with 1 3/4 GB of RAM to do nothing but P-1.
 
 This depends upon what it is you want to maximise.  If it's your effective
 contribution to the project, then yes.

I like my stats but I could certainly devote 1 machine out of 20 to this.
Assume I've got 1GB of RAM.  Do the higher B2s mean I should use a P4 rather
than a P3 for this task?

 Unfortunately, pure P-1 work is not supported by the Primenet server, so
 requires a lot of ongoing administration by the user.

 Alternatively, if the server is currently making
 assignments in your desired range, then you could obtain them by setting
 'Always have this many days of work queued up' to 90 days - manual
 communication to get some exponents - cut and past from worktodo.ini to a
 worktodo.saved file - manual communication to get some more - cut and past -
 etc.  This is what I do.
 
 The result of this will be a worktodo.saved file with a lot of entries that
 look like this
 
 DoubleCheck=8744819,64,0
 DoubleCheck=8774009,64,1
 ...
 
 (or 'Test=...' etc.)  Now copy some of these back to your worktodo.ini file,
 delete every entry ending in a 1 (These ones are already P-1 complete),
 change 'DoubleCheck=' or 'Test=' into 'Pfactor=', and change the 0 at the
 end to a 1 if the assignment was a 'DoubleCheck'.

Would I unreserve all the exponents that are already P-1 complete?
If I don't change the DoubleCheck into Pfactor then couldn't I just let
the exponent run and then sometime after P-1 is done move the entry and
the 2 tmp files over to another machine to finish it off?

 If you're willing to do all this, then there's another optimisation you
 might consider.  Since it's only stage 2 that requires the memory, you could
 devote your best machine(s) to this task, using your other boxes to feed
 them by doing stage 1.

That sounds like more work than I care to do.  I can see having 1 machine
do P-1 on lots of double-checks.

 A couple of other points:  You are limited in the CPU menu option to
 90% of physical memory, but this may be overridden by editing local.ini,
 where you can set available memory to physical memory less 8MB.

As an mprime user I edit the local.ini file all the time.  Per your notes
I upped *Memory to 466.
-- 
  [EMAIL PROTECTED] - HMC UNIX Systems Manager
_
Unsubscribe  list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers


Re: Mersenne: P-1 on PIII or P4?

2003-03-06 Thread Daran
- Original Message -
From: Chris Marble [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Tuesday, March 04, 2003 4:00 PM
Subject: Mersenne: P-1 on PIII or P4?

 I've got a couple of P4s that I can use on weekends.  I've been using them
 to finish off exponents that my PIIIs were working on.  Is that the right
 order?  P-1 on the PIII and then the rest on the P4.  I want to maximize
 my output.

Hmmm.  That's an intriguing question.

Based upon what I know of the algorithms involved, it *ought* to be the case
that you should do any P-1 work on the machine which can give it the most
memory, irrespective of processor type.

However, some time ago, I was given some information on the actual P-1
bounds chosen for exponents of various sizes, running on systems of various
processor/memory configurations.  It turns out that P4s choose *much deeper*
P-1 bounds than do other processors.  For example:

8233409,63,0,Robreid,done,,4,45,,Athlon,1.0/1.3,90
8234243,63,0,Robreid,done,,4,45,,Celeron,540,80
8234257,63,0,Robreid,done,,45000,742500,,P4,1.4,100

The last figure is the amount of available memory.  The differences between
80MB and 100MB, and between 8233409 and 8234257 are too small to account for
the near doubling in the B2 bound in the case of a P4.

Since I do not understand why this should be the case, I don't know for
certain, but it looks like a P4 is better for P-1.

Whichever machine you choose for P-1, always give it absolutely as much
memory as you can without thrashing.  There is an upper limit to how much it
will use, but this is probably in the gigabytes for exponents in even the
current DC range.  Memory is not relevant for factorisation, the actual LL
test, or stage 1 of the P-1.

It used to be the case that TF should be avoided on a P4, but that part of
this processor's code has been improved in recent versions, so I don't know
if this is still the case.  If you ever get an exponent that requires both
P-1 and extra TF, do the P-1 before the last bit of TF.  This doesn't alter
the likelihood of finding a factor, but if you do find one, on average you
will find it earlier, and for less work.

There are a number of ranges of exponent sizes where it is better to avoid
using P4s.  George posted the following table some time ago (Best viewed
with a fixed width font.)

FFT v21  v22.8v21 SSE2 v22.8 SSE2
262144  5255000  5255000  5185000  5158000
327680  652  6545000  6465000  6421000
393216  776  7779000  769  7651000
458752  904  9071000  897  8908000
524288  1033 1038 1024 1018
655360  1283 1289 1272 1265
786432  1530 1534 1516 1507
917504  1785 1789 1766 1755
1048576 2040 2046 2018 2005
1310720 2535 2539 2509 2493
1572864 3015 3019 2992 2969
1835008 3510 3520 3486 3456
2097152 4025 4030 3978 3950
2621440 5000 5002 4935 4910
3145728 5940 5951 5892 5852
3670016 6910 6936 6865 6813
4194304 7930 7930 7836 7791

If you are testing an exponent which is greater than an entry in the fifth
column, but less than the corresponding entry int the third column, then
avoid using a P4.  This applies to all types of work.

Where the considerations discussed above conflict, I don't know what the
balance is between them.

HTH

 --
   [EMAIL PROTECTED] - HMC UNIX Systems Manager

Daran G.


_
Unsubscribe  list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers


Re: Mersenne: P-1 on PIII or P4?

2003-03-06 Thread Brian J. Beesley
On Thursday 06 March 2003 13:03, Daran wrote:

 Based upon what I know of the algorithms involved, it *ought* to be the
 case that you should do any P-1 work on the machine which can give it the
 most memory, irrespective of processor type.

... assuming the OS allows a single process to grab the amount of memory 
configured in mprime/Prime95 (this may not always be the case, at any rate 
under linux, even if adequate physical memory is installed.)

 However, some time ago, I was given some information on the actual P-1
 bounds chosen for exponents of various sizes, running on systems of various
 processor/memory configurations.  It turns out that P4s choose *much
 deeper* P-1 bounds than do other processors.  For example:

 8233409,63,0,Robreid,done,,4,45,,Athlon,1.0/1.3,90
 8234243,63,0,Robreid,done,,4,45,,Celeron,540,80
 8234257,63,0,Robreid,done,,45000,742500,,P4,1.4,100

 The last figure is the amount of available memory.  The differences between
 80MB and 100MB, and between 8233409 and 8234257 are too small to account
 for the near doubling in the B2 bound in the case of a P4.

Yes, that does seem odd. I take it the software version is the same?

The only thing that I can think of is that the stage 2 storage space for 
temporaries is critical for exponents around this size such that having 90 
MBytes instead of 100 MBytes results in a reduced number of temporaries, 
therefore a slower stage 2 iteration time, therefore a significantly lower 
B2 limit.

I note also that the limits being used are typical of DC assignments. For 
exponents a bit smaller than this, using a P3 with memory configured at 320 
MBytes (also no OS restriction  plenty of physical memory to support it) but 
requesting first test limits (Pfactor=exponent,tfbits,0) I'm getting B2 
~ 20 B1 e.g.

[Thu Mar 06 12:07:46 2003]
UID: beejaybee/Simon1, M7479491 completed P-1, B1=9, B2=1732500, E=4, 
WY1: C198EE63

The balance between stage 1 and stage 2 should not really depend on the 
limits chosen since the number of temporaries required is going to be 
independent of the limit, at any rate above an unrealistically small value.

Why am I bothering about this exponent? Well, both LL  DC are attributed to 
the same user... not really a problem, but somehow it feels better to either 
find a factor or have an independent triple-check when this happens!

Regards
Brian Beesley
_
Unsubscribe  list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers


Re: Mersenne: P-1 on PIII or P4?

2003-03-06 Thread Chris Marble
Daran wrote:
 
 Whichever machine you choose for P-1, always give it absolutely as much
 memory as you can without thrashing.  There is an upper limit to how much it
 will use, but this is probably in the gigabytes for exponents in even the
 current DC range.

So I should use the PIII with 1 3/4 GB of RAM to do nothing but P-1.  It's an
older Xeon with 2MB cache.  Will that help too?
How would I do this?  I see the following in undoc.txt:

You can do P-1 factoring by adding lines to worktodo.ini:
Pfactor=exponent,how_far_factored,has_been_LL_tested_once
For example, Pfactor=1157,64,0


 There are a number of ranges of exponent sizes where it is better to avoid
 using P4s.  George posted the following table some time ago (Best viewed
 with a fixed width font.)
 
 FFT v21  v22.8v21 SSE2 v22.8 SSE2
 262144  5255000  5255000  5185000  5158000
 327680  652  6545000  6465000  6421000
 393216  776  7779000  769  7651000
 458752  904  9071000  897  8908000
 524288  1033 1038 1024 1018
 655360  1283 1289 1272 1265
 786432  1530 1534 1516 1507
 917504  1785 1789 1766 1755
 1048576 2040 2046 2018 2005
 1310720 2535 2539 2509 2493
 1572864 3015 3019 2992 2969
 1835008 3510 3520 3486 3456
 2097152 4025 4030 3978 3950
 2621440 5000 5002 4935 4910
 3145728 5940 5951 5892 5852
 3670016 6910 6936 6865 6813
 4194304 7930 7930 7836 7791
 
 If you are testing an exponent which is greater than an entry in the fifth
 column, but less than the corresponding entry int the third column, then
 avoid using a P4.  This applies to all types of work.

Useful info.  I've got 2 DCs in one of the ranges but one computer's a PIII
and the other's a Dec Alpha running Mlucas-2.7b-gen-5x.
-- 
  [EMAIL PROTECTED] - HMC UNIX Systems Manager
_
Unsubscribe  list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers


Mersenne: P-1 on PIII or P4?

2003-03-04 Thread Chris Marble
I've got a couple of P4s that I can use on weekends.  I've been using them
to finish off exponents that my PIIIs were working on.  Is that the right
order?  P-1 on the PIII and then the rest on the P4.  I want to maximize
my output.
-- 
  [EMAIL PROTECTED] - HMC UNIX Systems Manager
  My opinions are my own and probably don't represent anything anyway.
_
Unsubscribe  list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers