Re: Mersenne: P-1 on PIII or P4?
At 07:47 AM 3/11/2003 -0500, Richard Woods wrote: However, any difference in FFT size between a P4 and other CPU, because of SSE support/nonsupport, could make a difference to the algorithm because it _does_ take FFT size into account. There was a bug in calculating the the FFT size (bytes of memory consumed) for the P4. This bug caused the P-1 bounds selecting code to produce different results than the x86 code. This is a fairly benign bug and will be fixed in version 23.3 In case you care, the details are: There is a global variable called FFTLEN that is used in many places and is initialized by the FFT init routine. The routine to select the P-1 bounds is called before the FFT code is initialized. Thus, the routine to calculate the number of bytes consumed by an FFT cannot use the global variable FFTLEN. In fact, that routine is passed an argument - fftlen in lower case. Well, you guessed it, in the P4 section of the routine I referenced FFTLEN rather than fftlen. The routine worked fine once the FFT code was initialized - only the P-1 bounds selecting code was affected. BTW, the FFT size is more than FFT length * sizeof (double). There are various paddings thrown in for better cache usage. Sadly, if I had just used FFT length * sizeof (double) as an estimate for the size in selecting the P-1 bounds this bug never would have happened and the size estimate is more than accurate enough for the purposes of selecting bounds. --- Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.459 / Virus Database: 258 - Release Date: 2/25/2003
Re: Mersenne: P-1 on PIII or P4?
At 13:05:41, Monday, 3/10/03, Brian J. Beesley wrote: I just tried Test=8907359,64,0 on two systems - an Athlon XP 1700+ and a P4-2533, both running mprime v23.2 with 384 MB memory configured (out of 512 MB total in the system). These were fresh installations, I did nothing apart from adding SelfTest448Passed=1 to local.ini to save running the selftest. The Athlon system picked B1=105000, B2=1995000 whilst the P4 picked B1=105000, B2=2126250. So it seems that P4 is picking a significantly but not grossly higher B2 value. Yes, I checked, both systems are using 448K run length for this exponent (though it's only just under the P4 crossover). Maybe the P-1 bounds calculation accounts for the slightly slower than normal iteration time that 8907359 would have on a P4 because of the roundoff checking (since it is very close to the P4 512K FFT limit). -- Nick Glover [EMAIL PROTECTED] It's good to be open-minded, but not so open that your brains fall out. - Jacob Needleman _ Unsubscribe list info -- http://www.ndatech.com/mersenne/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
Re: Mersenne: P-1 on PIII or P4?
Daran wrote: I'd appreciate it if you or someone else could try starting a P-1 on the same exponent (not in one of the ranges where it would get a different FFT length) on two different machines, with the same memory allowed. P4: M8769809 completed P-1, B1=45000, B2=72, E=12, WY2: E2F4FF67 Memory allowed: 896MB (Machine has 1GB) PIII: M8769809 completed P-1, B1=45000, B2=72, E=12, WY2: E2F4FF67 Memory allowed: 990MB (Machine has 1 1/8GB) -- [EMAIL PROTECTED] - HMC UNIX Systems Manager _ Unsubscribe list info -- http://www.ndatech.com/mersenne/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
Re: Mersenne: P-1 on PIII or P4?
On Mon, Mar 10, 2003 at 09:05:41PM +, Brian J. Beesley wrote: On Monday 10 March 2003 07:49, Daran wrote: I just tried Test=8907359,64,0 on two systems - an Athlon XP 1700+ and a P4-2533, both running mprime v23.2 with 384 MB memory configured (out of 512 MB total in the system). These were fresh installations, I did nothing apart from adding SelfTest448Passed=1 to local.ini to save running the selftest. The Athlon system picked B1=105000, B2=1995000 whilst the P4 picked B1=105000, B2=2126250. So it seems that P4 is picking a significantly but not grossly higher B2 value. My Duron 800 picks values identical to your Athlon with 384MB allowed. No change at 400MB At 420MB B2 goes up to 2021250, still lower than your B2 value. At 504MB B2 remains at 2021250. I don't think George's '1 or 2 extra temporaries' theory stands up. Regards Brian Beesley Daran G. _ Unsubscribe list info -- http://www.ndatech.com/mersenne/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
Re: Mersenne: P-1 on PIII or P4?
At 01:16 AM 3/11/2003 +, Daran wrote: I don't think George's '1 or 2 extra temporaries' theory stands up. Sure it does. I fired up the debugger and the P4 has 5541 temporaries and the x86 has 89 temporaries. Hmmm, maybe I'd better look into it a little bit further --- Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.459 / Virus Database: 258 - Release Date: 2/25/2003
Re: Mersenne: P-1 on PIII or P4?
On Fri, Mar 07, 2003 at 09:51:33PM -0800, Chris Marble wrote: Daran wrote: On Thu, Mar 06, 2003 at 08:12:31PM -0800, Chris Marble wrote: Daran wrote: I like my stats but I could certainly devote 1 machine out of 20 to this. If you're going to use one machine to feed the others, then it won't harm your stats at all. Quite the contrary. Assume I've got 1GB of RAM. Do the higher B2s mean I should use a P4 rather than a P3 for this task? I don't know, because I don't know why it gets a higher B2. B1 and B2 are supposed to be chosen by the client so that the cost/benefit ratio is optimal. Does this mean that P4s is choose B2 values which are too high? Or does everything else choose values too low? Or is there some reason I can't think of, why higher values might be appropriate for a P4? In fact, I'm not even sure it does get a higher B2 - the apparent difference could be, as Brian suggested, due to differences between versions. I don't have access to a P4, so I can do any testing, But I'd appreciate it if you or someone else could try starting a P-1 on the same exponent (not in one of the ranges where it would get a different FFT length) on two different machines, with the same memory allowed. You would not need to complete the runs. You could abort the tests as soon as they've reported their chosen limits. Would I unreserve all the exponents that are already P-1 complete? If I don't change the DoubleCheck into Pfactor then couldn't I just let the exponent run and then sometime after P-1 is done move the entry and the 2 tmp files over to another machine to finish it off? If you're going to feed your other machines from this one, then obviously you won't need to unreserve the exponents they need. But there's an easier way to do this. Put SequentialWorkToDo=0 in prime.ini, then, so long as it never runs out of P-1 work to do, it will never start a first-time or doublecheck LL, and there will be no temporary files to move. I also suggest putting SkipTrialFactoring=1 in prime.ini. That sounds like more work than I care to do... I agree that with 20 boxes, the work would be onerous. ...I can see having 1 machine do P-1 on lots of double-checks. That would be well worth it. Since one box will *easily* feed the other twenty or so, you will have to decide whether to unreserve the exponents you P-1 beyond your needs, or occasionally let that box test (or start testing) one. You may find a better match between your rate of production of P-1 complete exponents, and your rate of consumption, if you do first-time testing. [...] As an mprime user I edit the local.ini file all the time. Per your notes I upped *Memory to 466. That will certainly help exponents below 9071000 on a P3, or 8908000 on a P4. The current DC level is now over 917, so I doubt this will help much, (though of course, it won't harm, either). I haven't tried. I'm still getting enough sub 9071000 expiries. -- [EMAIL PROTECTED] - HMC UNIX Systems Manager Daran G. _ Unsubscribe list info -- http://www.ndatech.com/mersenne/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
Re: Mersenne: P-1 on PIII or P4?
On Thu, Mar 06, 2003 at 08:12:31PM -0800, Chris Marble wrote: Daran wrote: Whichever machine you choose for P-1, always give it absolutely as much memory as you can without thrashing. There is an upper limit to how much it will use, but this is probably in the gigabytes for exponents in even the current DC range. So I should use the PIII with 1 3/4 GB of RAM to do nothing but P-1. This depends upon what it is you want to maximise. If it's your effective contribution to the project, then yes. Absolutely! This is what I do on my Duron 800 with a 'mere' 1/2GB. The idea is that the deep and efficient P-1 you do replaces the probably much less effective effort that the final recipient of the exponent would otherwise have made (or not made, in the case of a few very old clients that might still be running). I've not done any testing, but I'm pretty sure that it would be worthwhile to put any machine with more than about 250MB available to exclusive P-1 use. On the other hand, you will do your ranking in the producer tables no favours if you go down this route. It's an older Xeon with 2MB cache. Will that help too? You'll have to ask George if there is a codepath optimised for this processor. But whether there is or there isn't, should affect only the absolute speed, not the trade-off between P-1 and LL testing. I can't see how a 2MB cache can do any harm, though. How would I do this? I see the following in undoc.txt: You can do P-1 factoring by adding lines to worktodo.ini: Pfactor=exponent,how_far_factored,has_been_LL_tested_once For example, Pfactor=1157,64,0 Unfortunately, pure P-1 work is not supported by the Primenet server, so requires a lot of ongoing administration by the user. First you need to decide which range of exponents is optimal for your system(s). (I'll discuss this below). Then you need to obtain *a lot* of exponents in that range to test. I do roughly eighty P-1s on DC exponents in the ten days or so it would take me to do a single LL. The easiest way to get your exponents is probably to email George and tell him what you want. Alternatively, if the server is currently making assignments in your desired range, then you could obtain them by setting 'Always have this many days of work queued up' to 90 days - manual communication to get some exponents - cut and past from worktodo.ini to a worktodo.saved file - manual communication to get some more - cut and past - etc. This is what I do. The result of this will be a worktodo.saved file with a lot of entries that look like this DoubleCheck=8744819,64,0 DoubleCheck=8774009,64,1 ... (or 'Test=...' etc.) Now copy some of these back to your worktodo.ini file, delete every entry ending in a 1 (These ones are already P-1 complete), change 'DoubleCheck=' or 'Test=' into 'Pfactor=', and change the 0 at the end to a 1 if the assignment was a 'DoubleCheck'. When you next contact the server, any completed work will be reported, but the assignments will not be unreserved, unless you act to make this happen. The easiest way to do this is to set 'Always have this many days of work queued up' to 1 day, and copy your completed exponents from your worktodo.saved file back to your worktodo.ini (not forgetting any that were complete when you got them). You do not need to unreserve exponents obtained directly from George. Like I said, It's *a lot* of user administration. It's not nearly as complicated as it sounds, once you get into the routine, but it's definitely not something you can set up, then forget about. If you're willing to do all this, then there's another optimisation you might consider. Since it's only stage 2 that requires the memory, you could devote your best machine(s) to this task, using your other boxes to feed them by doing stage 1. This is assuming that they're networked together. Moving multimegabyte date files via Floppy Disk Transfer Protocol is not recommended. [...] If you are testing an exponent which is greater than an entry in the fifth column, but less than the corresponding entry int the third column, then avoid using a P4. This applies to all types of work. Actually it's worse than this. The limits are soft, so if you are testing an exponent *slightly* less than an entry in column 5, or *slightly* greater than one in column 3, then you should avoid a P4. Choice of exponent range Stage two's memory requirements are not continuous. This remark is probably best illustrated with an example: on my system, when stage 2-ing an exponent in the range 777 through 9071000, the program uses 448MB. If that much memory isn't available, then it uses 241MB. If that's out of range, then the next level down is 199MB, and so on. There are certainly usage levels higher than I can give it. The benefits of using the higher memory levels are threefold. 1. The algorithm runs faster. 2. The program responds by deepening the search,
Re: Mersenne: P-1 on PIII or P4?
Daran wrote: On Thu, Mar 06, 2003 at 08:12:31PM -0800, Chris Marble wrote: Daran wrote: Whichever machine you choose for P-1, always give it absolutely as much memory as you can without thrashing. There is an upper limit to how much it will use, but this is probably in the gigabytes for exponents in even the current DC range. So I should use the PIII with 1 3/4 GB of RAM to do nothing but P-1. This depends upon what it is you want to maximise. If it's your effective contribution to the project, then yes. I like my stats but I could certainly devote 1 machine out of 20 to this. Assume I've got 1GB of RAM. Do the higher B2s mean I should use a P4 rather than a P3 for this task? Unfortunately, pure P-1 work is not supported by the Primenet server, so requires a lot of ongoing administration by the user. Alternatively, if the server is currently making assignments in your desired range, then you could obtain them by setting 'Always have this many days of work queued up' to 90 days - manual communication to get some exponents - cut and past from worktodo.ini to a worktodo.saved file - manual communication to get some more - cut and past - etc. This is what I do. The result of this will be a worktodo.saved file with a lot of entries that look like this DoubleCheck=8744819,64,0 DoubleCheck=8774009,64,1 ... (or 'Test=...' etc.) Now copy some of these back to your worktodo.ini file, delete every entry ending in a 1 (These ones are already P-1 complete), change 'DoubleCheck=' or 'Test=' into 'Pfactor=', and change the 0 at the end to a 1 if the assignment was a 'DoubleCheck'. Would I unreserve all the exponents that are already P-1 complete? If I don't change the DoubleCheck into Pfactor then couldn't I just let the exponent run and then sometime after P-1 is done move the entry and the 2 tmp files over to another machine to finish it off? If you're willing to do all this, then there's another optimisation you might consider. Since it's only stage 2 that requires the memory, you could devote your best machine(s) to this task, using your other boxes to feed them by doing stage 1. That sounds like more work than I care to do. I can see having 1 machine do P-1 on lots of double-checks. A couple of other points: You are limited in the CPU menu option to 90% of physical memory, but this may be overridden by editing local.ini, where you can set available memory to physical memory less 8MB. As an mprime user I edit the local.ini file all the time. Per your notes I upped *Memory to 466. -- [EMAIL PROTECTED] - HMC UNIX Systems Manager _ Unsubscribe list info -- http://www.ndatech.com/mersenne/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
Re: Mersenne: P-1 on PIII or P4?
- Original Message - From: Chris Marble [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Tuesday, March 04, 2003 4:00 PM Subject: Mersenne: P-1 on PIII or P4? I've got a couple of P4s that I can use on weekends. I've been using them to finish off exponents that my PIIIs were working on. Is that the right order? P-1 on the PIII and then the rest on the P4. I want to maximize my output. Hmmm. That's an intriguing question. Based upon what I know of the algorithms involved, it *ought* to be the case that you should do any P-1 work on the machine which can give it the most memory, irrespective of processor type. However, some time ago, I was given some information on the actual P-1 bounds chosen for exponents of various sizes, running on systems of various processor/memory configurations. It turns out that P4s choose *much deeper* P-1 bounds than do other processors. For example: 8233409,63,0,Robreid,done,,4,45,,Athlon,1.0/1.3,90 8234243,63,0,Robreid,done,,4,45,,Celeron,540,80 8234257,63,0,Robreid,done,,45000,742500,,P4,1.4,100 The last figure is the amount of available memory. The differences between 80MB and 100MB, and between 8233409 and 8234257 are too small to account for the near doubling in the B2 bound in the case of a P4. Since I do not understand why this should be the case, I don't know for certain, but it looks like a P4 is better for P-1. Whichever machine you choose for P-1, always give it absolutely as much memory as you can without thrashing. There is an upper limit to how much it will use, but this is probably in the gigabytes for exponents in even the current DC range. Memory is not relevant for factorisation, the actual LL test, or stage 1 of the P-1. It used to be the case that TF should be avoided on a P4, but that part of this processor's code has been improved in recent versions, so I don't know if this is still the case. If you ever get an exponent that requires both P-1 and extra TF, do the P-1 before the last bit of TF. This doesn't alter the likelihood of finding a factor, but if you do find one, on average you will find it earlier, and for less work. There are a number of ranges of exponent sizes where it is better to avoid using P4s. George posted the following table some time ago (Best viewed with a fixed width font.) FFT v21 v22.8v21 SSE2 v22.8 SSE2 262144 5255000 5255000 5185000 5158000 327680 652 6545000 6465000 6421000 393216 776 7779000 769 7651000 458752 904 9071000 897 8908000 524288 1033 1038 1024 1018 655360 1283 1289 1272 1265 786432 1530 1534 1516 1507 917504 1785 1789 1766 1755 1048576 2040 2046 2018 2005 1310720 2535 2539 2509 2493 1572864 3015 3019 2992 2969 1835008 3510 3520 3486 3456 2097152 4025 4030 3978 3950 2621440 5000 5002 4935 4910 3145728 5940 5951 5892 5852 3670016 6910 6936 6865 6813 4194304 7930 7930 7836 7791 If you are testing an exponent which is greater than an entry in the fifth column, but less than the corresponding entry int the third column, then avoid using a P4. This applies to all types of work. Where the considerations discussed above conflict, I don't know what the balance is between them. HTH -- [EMAIL PROTECTED] - HMC UNIX Systems Manager Daran G. _ Unsubscribe list info -- http://www.ndatech.com/mersenne/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
Re: Mersenne: P-1 on PIII or P4?
On Thursday 06 March 2003 13:03, Daran wrote: Based upon what I know of the algorithms involved, it *ought* to be the case that you should do any P-1 work on the machine which can give it the most memory, irrespective of processor type. ... assuming the OS allows a single process to grab the amount of memory configured in mprime/Prime95 (this may not always be the case, at any rate under linux, even if adequate physical memory is installed.) However, some time ago, I was given some information on the actual P-1 bounds chosen for exponents of various sizes, running on systems of various processor/memory configurations. It turns out that P4s choose *much deeper* P-1 bounds than do other processors. For example: 8233409,63,0,Robreid,done,,4,45,,Athlon,1.0/1.3,90 8234243,63,0,Robreid,done,,4,45,,Celeron,540,80 8234257,63,0,Robreid,done,,45000,742500,,P4,1.4,100 The last figure is the amount of available memory. The differences between 80MB and 100MB, and between 8233409 and 8234257 are too small to account for the near doubling in the B2 bound in the case of a P4. Yes, that does seem odd. I take it the software version is the same? The only thing that I can think of is that the stage 2 storage space for temporaries is critical for exponents around this size such that having 90 MBytes instead of 100 MBytes results in a reduced number of temporaries, therefore a slower stage 2 iteration time, therefore a significantly lower B2 limit. I note also that the limits being used are typical of DC assignments. For exponents a bit smaller than this, using a P3 with memory configured at 320 MBytes (also no OS restriction plenty of physical memory to support it) but requesting first test limits (Pfactor=exponent,tfbits,0) I'm getting B2 ~ 20 B1 e.g. [Thu Mar 06 12:07:46 2003] UID: beejaybee/Simon1, M7479491 completed P-1, B1=9, B2=1732500, E=4, WY1: C198EE63 The balance between stage 1 and stage 2 should not really depend on the limits chosen since the number of temporaries required is going to be independent of the limit, at any rate above an unrealistically small value. Why am I bothering about this exponent? Well, both LL DC are attributed to the same user... not really a problem, but somehow it feels better to either find a factor or have an independent triple-check when this happens! Regards Brian Beesley _ Unsubscribe list info -- http://www.ndatech.com/mersenne/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
Re: Mersenne: P-1 on PIII or P4?
Daran wrote: Whichever machine you choose for P-1, always give it absolutely as much memory as you can without thrashing. There is an upper limit to how much it will use, but this is probably in the gigabytes for exponents in even the current DC range. So I should use the PIII with 1 3/4 GB of RAM to do nothing but P-1. It's an older Xeon with 2MB cache. Will that help too? How would I do this? I see the following in undoc.txt: You can do P-1 factoring by adding lines to worktodo.ini: Pfactor=exponent,how_far_factored,has_been_LL_tested_once For example, Pfactor=1157,64,0 There are a number of ranges of exponent sizes where it is better to avoid using P4s. George posted the following table some time ago (Best viewed with a fixed width font.) FFT v21 v22.8v21 SSE2 v22.8 SSE2 262144 5255000 5255000 5185000 5158000 327680 652 6545000 6465000 6421000 393216 776 7779000 769 7651000 458752 904 9071000 897 8908000 524288 1033 1038 1024 1018 655360 1283 1289 1272 1265 786432 1530 1534 1516 1507 917504 1785 1789 1766 1755 1048576 2040 2046 2018 2005 1310720 2535 2539 2509 2493 1572864 3015 3019 2992 2969 1835008 3510 3520 3486 3456 2097152 4025 4030 3978 3950 2621440 5000 5002 4935 4910 3145728 5940 5951 5892 5852 3670016 6910 6936 6865 6813 4194304 7930 7930 7836 7791 If you are testing an exponent which is greater than an entry in the fifth column, but less than the corresponding entry int the third column, then avoid using a P4. This applies to all types of work. Useful info. I've got 2 DCs in one of the ranges but one computer's a PIII and the other's a Dec Alpha running Mlucas-2.7b-gen-5x. -- [EMAIL PROTECTED] - HMC UNIX Systems Manager _ Unsubscribe list info -- http://www.ndatech.com/mersenne/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
Mersenne: P-1 on PIII or P4?
I've got a couple of P4s that I can use on weekends. I've been using them to finish off exponents that my PIIIs were working on. Is that the right order? P-1 on the PIII and then the rest on the P4. I want to maximize my output. -- [EMAIL PROTECTED] - HMC UNIX Systems Manager My opinions are my own and probably don't represent anything anyway. _ Unsubscribe list info -- http://www.ndatech.com/mersenne/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers