Re: Mersenne: Re: Trial Factoring: Back to the math
> Me as well. From what I understand the breakpoints to be, my K6-2 400 > would be put on first time LL's by default. I tried it on DC's, but the > FPU is so poor that it is really only suited to factoring. My 466 Celeron > could do LL's very slowly, but given the relative balance between LL's and > DC's right now, I chose to put it on DC's. Unsexy work highly unlikely to > find a prime, and yet still needs to be done. I have cached on my P-200 4 TF assignments that fall under the 2^66 bit limit. This machine takes roughly one week to complete 2^65. I don't know what I'm going to do with this machine after is starts crunching TF on 18M. I did one in the past, it took nearly a month :( But I refuse to retire this machine. Let it crunch away until the mobo fries. :) _ Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
Re: Mersenne: Re: Trial Factoring: Back to the math
On Sun, 17 Feb 2002 [EMAIL PROTECTED] wrote: > Ah, but, factoring is actually moving ahead anyway in terms of > CPU years required for LL to catch up - because larger exponents > take more time to LL test. My impression is that the gap between > the head of the factoring assignment queue and the head of the LL > testing assignment queue has remained fairly steady at 4 to 5 > million for at least three years. And everytime we move up a bit level the time required to factor an exponent goes up as well. And the gap is only 3.7M right now, and PrimeNet seems to be having that problem with large numbers of factoring assignments not be handed out correctly. I'm not meaning to be Chicken Little here, I don't think there is any need for a "call to arms" on factoring (since the balance of machines on each task can be somewhat manipulated centrally). I just wanted to correct the misperception that factoring was going faster than LL. > Another point, surely, is that factoring assignments "suddenly > slowed down" when we reached the changeover point between 65 > bit and 66 bit trial factoring depth in the low 18 millions. The cutoff is at 17.85M if I remember correctly. I really only look at "net assignments checked out" because that is an easy number to eyeball. That rate won't be affected by a jump to a new factoring depth right away, but that will cause some slowdown later on as machines given the bigger assignments are slower to return them and check out new assignments. The biggest change lately has been that LL assignments have slowed down somewhat (and 33M assignments may have picked up, but those are a lot more variable, and harder to eyeball a trend unless a record is kept). Factoring assignments have picked up somewhat recently, but I don't think this is a permanent trend. I suspect it is largely if not entirely due to the challenge that one team is embarking upon, and that is going to last only a month. > Eh? The only PrimeNet problem I'm aware of with in this respect is > that it can't cope with having LL and DC assignments for the same > exponent out at the same time. It may have been fixed, but in the archives I saw references to PrimeNet not properly expiring and then handing out again first time LL's if they were in a range that had been overrun by DC's. If there is no PrimeNet problem, then there is no problem with the DC range overrunning the LL range at all. > Well, some of us try - personally I have several systems which > would be running LL assignments if left to "makes most sense" > running DCs instead. Me as well. From what I understand the breakpoints to be, my K6-2 400 would be put on first time LL's by default. I tried it on DC's, but the FPU is so poor that it is really only suited to factoring. My 466 Celeron could do LL's very slowly, but given the relative balance between LL's and DC's right now, I chose to put it on DC's. Unsexy work highly unlikely to find a prime, and yet still needs to be done. > The difficulty here (it's purely a matter of perception, not a technical > problem) is that, by pushing up the number of systems that get DC > assignments by default, we risk alienating new users who find that > they're not being given "exciting" assignments, and those with > slower systems who find that even DC assignments are taking well > over a month. I don't think there is really anything wrong with the DC breakpoint as it is right now. The spread between leading and trailing edge is not excessive, and the leading edge is not that far behind the trailing edge of LL. It's pretty well balanced. And the point about "exciting" assignments is well taken. There was a surge of new users after M39 was discovered, and of those who did not stick around (and let their assignments expire), the vast majority were doing first time LL. Most likely disappointed in how long it takes to complete one and not willing to do less sexy but less time consuming work. But in spite of those who dropped out, it is evident that GIMPS picked up quite a few users who have stuck it out, so the expiries from those dropping out is just the cost of recruiting new users. There's also the issue of all the myriad things that can happen between the time an assignment is started and the time it ends. Today I went to help my father set up his new 800MHz Duron, and I put Prime95 on it. Despite the fact that it is quite capable of doing LL's, I put it on factoring. His history with screwing up his machine (though I still love him, bless his heart) doesn't bode well for anything but the shortest of assignments. Longer assignments just don't survive those who tend to blow away their boxes every now and then. Which leads me to something else that might benefit factoring. Factoring doesn't have huge save files like LL does. They're only 32 bytes. I'd suggest that anytime a box makes an update of expected completion dates, that it send along that saved information. If the assign
RE: Assignment balancing (was: Re: Mersenne: Re: Trial Factoring: Back to the math)
Another thing people might want to consider... If you have a dual processor box, have one of the prime threads doing factoring while the other does LL tests. I've been meaning to do that for a while now, and I guess I might get around to that next week. I think most of the machines I have testing are dual processor, so that should increase my factoring output at least. :) I've also got my slower machines (under 500MHz or so) already doing factoring... goes back to something I've always said: pick the right assignments for the right machines. :) Aaron > -Original Message- > From: [EMAIL PROTECTED] [mailto:mersenne-invalid- > [EMAIL PROTECTED]] On Behalf Of [EMAIL PROTECTED] > Sent: Sunday, February 17, 2002 2:55 PM > To: Russel Brooks > Cc: [EMAIL PROTECTED] > Subject: Assignment balancing (was: Re: Mersenne: Re: Trial Factoring: > Back to the math) > > On 17 Feb 2002, at 17:54, Russel Brooks wrote: > > > Am I interpreting this thread correctly? That more factoring is > > needed? My climb up the LL top producers is starting to stall > > so maybe it's time to switch to factoring. > > I'm quite sure there's no need to panic! > > So far as I'm concerned, I've been underdoing factoring - less than > 5% of my contribution. Theoretically the best balance for the > project as a whole is about 10% of the CPU effort in trial factoring. > > The other point here is that we have had two major improvements in > the efficiency of the LL testing algorithm since the last time trial > factoring was seriously looked at, so some theoretical work on trial > factoring is probably due. > > Why not try some factoring assignments, if _you_ think it will be > fun for a change? You can always change back when you get > bored... _ Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
Assignment balancing (was: Re: Mersenne: Re: Trial Factoring: Back to the math)
On 17 Feb 2002, at 17:54, Russel Brooks wrote: > Am I interpreting this thread correctly? That more factoring is > needed? My climb up the LL top producers is starting to stall > so maybe it's time to switch to factoring. I'm quite sure there's no need to panic! So far as I'm concerned, I've been underdoing factoring - less than 5% of my contribution. Theoretically the best balance for the project as a whole is about 10% of the CPU effort in trial factoring. The other point here is that we have had two major improvements in the efficiency of the LL testing algorithm since the last time trial factoring was seriously looked at, so some theoretical work on trial factoring is probably due. Why not try some factoring assignments, if _you_ think it will be fun for a change? You can always change back when you get bored... > > I know the LL tests drive the FPU hard, what about factoring? Factoring is a light PFU and memory load. LL (and DC) assignments are heavy on both. Indications to run factoring irrespective of CPU speed: 1) One CPU on a dual processor system. Both processors running LL tests hits the memory bus too hard & ruins performance. 2) Some processors (AMD K6, and especially Cyrix 6x6) have inefficient FPUs; it is much more effective to have these processors running trial factoring. 3) A PIII or Athlon system running very close to the limit of the cooling system. Because the load on the FPU imposed by factoring is much lower, the power consumption of the CPU will reduce if factoring is run instead of LL testing, and the temperature will therefore drop significantly. 4) System very short of main memory. Trial factoring uses very little memory for data work space. Counterindications: It doesn't seem to be sensible to be running trial factoring on a P4. The P4 architecture is _very_ efficient at running LL tests, but the performance on the instruction mix used in trial factoring is a great deal worse (clock for clock) than PII/PIII/Athlon. Regards Brian Beesley _ Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
Re: Mersenne: Re: Trial Factoring: Back to the math
Am I interpreting this thread correctly? That more factoring is needed? My climb up the LL top producers is starting to stall so maybe it's time to switch to factoring. I know the LL tests drive the FPU hard, what about factoring? Cheers... Russ _ Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
Re: Mersenne: Re: Trial Factoring: Back to the math
On 16 Feb 2002, at 12:26, Mary Conner wrote: > Trial factoring is well ahead of LL testing, but the gap is closing. > Yesterday was the first day in a long time where the net number of > checkouts for factoring exceeded those for first time LL's. That is due > to the fact that one team is having a factoring challenge among their > membership due to begin soon, and large numbers of factoring assignments > are being checked out in preparation. It isn't a trend that is going to > persist. Ongoing, more factoring assignments have to be done than LL > assignments because factoring eliminates a substantial number of exponents > from LL testing. Ah, but, factoring is actually moving ahead anyway in terms of CPU years required for LL to catch up - because larger exponents take more time to LL test. My impression is that the gap between the head of the factoring assignment queue and the head of the LL testing assignment queue has remained fairly steady at 4 to 5 million for at least three years. Nevertheless I've just tripled my factoring effort from a single PIII- 500 by switching one CPU on a dual processor PIII-1000 system from ECM to factoring. The other CPU is of course running LL tests. > > The situation has improved recently from two earlier surges in LL testing. > One accompanied the discovery of M39, the other was a post-Christmas surge > (GIMPS participants with shiny new fast computers replacing slower ones, > presumably). Many of those who joined up after M39 have dropped out, and > the crest of the wave of their expiries has passed (over 150 assignments > started on Nov. 14th expired in one day, for instance). For awhile, the > balance was around 300 net LL's checked out to 150 factoring assignments. > That has narrowed to around 250 to 200, and sometimes comes fairly even, > although to keep up, GIMPS needs more factoring assignments done than > LL's. Another point, surely, is that factoring assignments "suddenly slowed down" when we reached the changeover point between 65 bit and 66 bit trial factoring depth in the low 18 millions. > > The cutoffs for those computers set to "do the work that makes the most > sense" can be adjusted to put more computers on trial factoring work if LL > testing starts to overrun trial factoring. The cutoffs could also move > some computers doing first time LL onto doing double checks, although too > much would have the leading edge of DC's running into the trailing edge of > LL's, and I understand that PrimeNet has difficulty with that situation. Eh? The only PrimeNet problem I'm aware of with in this respect is that it can't cope with having LL and DC assignments for the same exponent out at the same time. > Right now DC checkouts are going very slowly, less than 100 a day most > days. Well, some of us try - personally I have several systems which would be running LL assignments if left to "makes most sense" running DCs instead. The difficulty here (it's purely a matter of perception, not a technical problem) is that, by pushing up the number of systems that get DC assignments by default, we risk alienating new users who find that they're not being given "exciting" assignments, and those with slower systems who find that even DC assignments are taking well over a month. > > Something that would likely help a lot would be to add factoring > capability and PrimeNet access code to those programs that don't yet have > it so that more computers can reasonably participate. Yes. There are also a lot of non-Intel un*x systems out there, most of which would be happier with DC than LL assignments, that we could probably attract if we had an automatic interface for obtaining assignments from PrimeNet and submitting results of completed tests. At least two reasonably efficient and reliable clients (Mlucas and Glucas) are already available for a good range of host systems; I'm sure the perceived complications of using the manual assignment forms are responsible for at least some users not participating. Regards Brian Beesley _ Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
Re: Mersenne: Re: Trial Factoring: Back to the math
On 16 Feb 2002, at 15:42, George Woltman wrote: > To put it in perspective, let's say you are looking for 64 bit factors > of a 10,000,000 bit number. > > The basic algorithm is: > Multiply 156,250 trial factors together to form a 10,000,000 bit > number. > do { > Multiply next 156,250 trial factors making 2nd 10M bit number. > Multiply the two 10,000,000 bit numbers together mod 2^P-1 > } while there are more 64-bit factors > GCD (10,000,000 bit product, 2^P-1) My suggested modification groups things a bit differently: by and large we're working in powers of 2, so the first step of the inner loop would go something like Multiply 131072 64-bit numbers together in pairs Multiply the resulting 65536 128-bit products together in pairs ... Multiply the resulting four 4194404-bit products together in pairs Multiply the resulting two 8388608-bit products together modulo 2^p-1 Repeat the above for the next block of 131072 64-bit numbers Multiply the resulting two 10M-bit products together and reduce modulo 2^P-1 etc. etc. So for each block of 131072 numbers we have 65536 64-bit multiplications 32768 128-bit multiplications 16384 256-bit multiplications 8192 512-bit multiplications 4096 1024-bit multiplications 2048 2048-bit multiplications 1024 8192-bit multiplications 512 16384-bit multiplications 256 32768-bit multiplications 128 65536-bit multiplications 64 131072-bit multiplications 32 262144-bit multiplications 16 524288-bit multiplications 8 1048576-bit multiplications 4 2097152-bit multiplications 2 4194304-bit multiplications and one 8388608 x 10M bit multiplication modulo 2^p-1 There seems to be about the same amount of work in each level, nevertheless the lower levels should be faster since even FFT is O(n log n) and we can probably do better than FFT for the very smallest products. > > Looking at the inner loop the second step uses the fast LL multiply code. > On my P4-1400 this takes 0.06 seconds or 84 million clocks or (dividing > by 156,250) 538 clocks per trial factor. This is quite promising. I haven't > measured it but the current code can test one trial factor in about 4000 > clocks. My scheme has 17 levels so the upper bound of the _average_ time per trial factor should be 17 * 538 clocks. This is too darned slow but, as I indicated above, the actual execution time should be significantly less than that. But I thought the time to actually test a 64-bit factor - not counting sieving overheads - was less than 2000 clocks ??? > > The stumbling block is how can we quickly do the first step of the inner > loop - multiplying 156,250 bit trial factors together. If someone has a > clever algorithm that can do this in fewer than 3500 clocks per trial factor > then we may have an improved factoring algorithm! The other point I was trying to make is that a similar algorithm could be used to compute the product of small primes used by P-1 stage 1 (and maybe stage 2 as well) with a highly significant improvement over the current method. There may also be an application in ECM. Regards Brian Beesley Regards Brian Beesley _ Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
Re: Mersenne: Re: Trial Factoring: Back to the math
At 05:18 PM 2/16/2002 +, [EMAIL PROTECTED] wrote: >So the viability of the new algorithm depends on whether we can >compute P*F (mod 2^p-1) faster than we can compute 2^p (mod F). To put it in perspective, let's say you are looking for 64 bit factors of a 10,000,000 bit number. The basic algorithm is: Multiply 156,250 trial factors together to form a 10,000,000 bit number. do { Multiply next 156,250 trial factors making 2nd 10M bit number. Multiply the two 10,000,000 bit numbers together mod 2^P-1 } while there are more 64-bit factors GCD (10,000,000 bit product, 2^P-1) Looking at the inner loop the second step uses the fast LL multiply code. On my P4-1400 this takes 0.06 seconds or 84 million clocks or (dividing by 156,250) 538 clocks per trial factor. This is quite promising. I haven't measured it but the current code can test one trial factor in about 4000 clocks. The stumbling block is how can we quickly do the first step of the inner loop - multiplying 156,250 bit trial factors together. If someone has a clever algorithm that can do this in fewer than 3500 clocks per trial factor then we may have an improved factoring algorithm! _ Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
Re: Mersenne: Re: Trial Factoring: Back to the math
On Sat, 16 Feb 2002 [EMAIL PROTECTED] wrote: > Is this _really_ a problem, or is it likely to become one in the > forseeable future? Trial factoring is well ahead of LL testing at > present, and seems to be still pulling further ahead, despite the > improvement in the relative efficiency of LL testing vs. trial factoring > with fairly widespread deployment of P4 systems. Trial factoring is well ahead of LL testing, but the gap is closing. Yesterday was the first day in a long time where the net number of checkouts for factoring exceeded those for first time LL's. That is due to the fact that one team is having a factoring challenge among their membership due to begin soon, and large numbers of factoring assignments are being checked out in preparation. It isn't a trend that is going to persist. Ongoing, more factoring assignments have to be done than LL assignments because factoring eliminates a substantial number of exponents from LL testing. The situation has improved recently from two earlier surges in LL testing. One accompanied the discovery of M39, the other was a post-Christmas surge (GIMPS participants with shiny new fast computers replacing slower ones, presumably). Many of those who joined up after M39 have dropped out, and the crest of the wave of their expiries has passed (over 150 assignments started on Nov. 14th expired in one day, for instance). For awhile, the balance was around 300 net LL's checked out to 150 factoring assignments. That has narrowed to around 250 to 200, and sometimes comes fairly even, although to keep up, GIMPS needs more factoring assignments done than LL's. > I would have thought that, in the event that LL testing did start to > catch up to trial factoring, the first thing to do would be to reduce > the initial trial factoring depth by 1, then run P-1 and only then > proceed to do the last bit of trial factoring. In fact on P4 systems it > might be more economical to stop trial factoring sooner than that. The cutoffs for those computers set to "do the work that makes the most sense" can be adjusted to put more computers on trial factoring work if LL testing starts to overrun trial factoring. The cutoffs could also move some computers doing first time LL onto doing double checks, although too much would have the leading edge of DC's running into the trailing edge of LL's, and I understand that PrimeNet has difficulty with that situation. Right now DC checkouts are going very slowly, less than 100 a day most days. Something that would likely help a lot would be to add factoring capability and PrimeNet access code to those programs that don't yet have it so that more computers can reasonably participate. _ Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
Re: Mersenne: Re: Trial Factoring: Back to the math
On 15 Feb 2002, at 18:35, [EMAIL PROTECTED] wrote: > The algorithm under discussion does share some superficial > features with p-1, in that we do a bunch of a big-integer > modular multiplies followed by a gcd to extract any factors. > But in fact we have complete control over the factors that > will be found, by way of the candidates we throw into the > accumulated product. So it really is equivalent to sieve-based > trial factoring, the difference being in the way we test the > candidates. Ah, now I see clearly what's going on. > > >The other point here is that the GCD is _expensive_. On a 1GHz > >P3 or Athlon the GCD at the end of P-1 on exponent ~10^7 takes ~ > >10 minutes. > > Ah, but that's (as Rich Schroeppel pointed out) the beauty of > modular arithmetic. If we only need to do a single gcd, who cares > if it takes ten minutes or even a few hours? All right, let's assume that the cost of the GCD is zero, or as close to zero as makes no difference. The difference between the new algorithm and "traditional" trial factoring is that, for each possible factor F that survives being sieved out, we calculate P(n+1) = (P(n)*F) (mod 2^p-1) and hope that, when we have incorporated sufficient possible factors, we find GCD(P(N),2^p-1) > 1 whereas the "traditional" method simply calculates 2^p (mod F) for each F in turn hoping to find the result 1 indicating that we have found a factor. So the viability of the new algorithm depends on whether we can compute P*F (mod 2^p-1) faster than we can compute 2^p (mod F). The efficiency of the sieving is irrelevant, because if we can efficiently sieve out factors for one algorithm, we can use exactly the same sieving technique with the other algorithm. When the average value of P is 2^(p-1), F ~ 2^64 and p ~ 10^7 this looks (to say the least) rather improbable. (Though see later in this message ...) The point here is that running P-1 stage 1 with B1=10^6 takes a while, but (a) there are in general going to be a lot more non-trivially-sieved-out factor candidates between e.g. 2^64 and 2^65 than there are primes < 10^6, and (b) each multiplication certainly won't be faster - in fact we will probably be multiplying by a double- or triple-length scalar quantity, whereas 10^6 fits very easily into a single word. Now, if we could find a "weak pseudoprimality" test - needs to be _much_ faster than Fermat's Test, since computing 3^(f-1) (mod f) is going to take longer than computing 2^p (mod f) - we could use this to enhance the screening of factors & so speed up trial factoring (_any_ implementation of it). Might there be mileage is _this_ idea? The test need not be very accurate, if it's fast enough; certainly we can afford to take a lot more "false positives" than Fermat's Test lets through. > > If it became clear that we could implement this algorithm and > have it run within factor of no more than 3-4 times slower than > one-factor-at-a-time sieving for factors <= 64 bits, it would be > very promising indeed, since it suffers no large performance > hit for factor candidates above 64 bits. I still don't see this clearly. Surely there is going to be a "large performance hit" when multiplying a very large integer by a scalar when the scalar jumps in size from N to N+1 computer words, and N is a small number (2 for 32-bit architectures, 1 for 64-bit architectures). > And it would allow > machines with poor integer multiply (e.g. the Sparc family) > to join in the factoring work, since most of the work could > use floating-point arithmetic, irrespective of the size of > the factors. Is this _really_ a problem, or is it likely to become one in the forseeable future? Trial factoring is well ahead of LL testing at present, and seems to be still pulling further ahead, despite the improvement in the relative efficiency of LL testing vs. trial factoring with fairly widespread deployment of P4 systems. I would have thought that, in the event that LL testing did start to catch up to trial factoring, the first thing to do would be to reduce the initial trial factoring depth by 1, then run P-1 and only then proceed to do the last bit of trial factoring. In fact on P4 systems it might be more economical to stop trial factoring sooner than that. ... Enough sceptisicm; here's a pointer to how thought triggered by this discussion could help improve the efficiency of other factoring methods: It occurred to me that the operation (P * F) (mod 2^p-1) implicit in this proposal - and P-1 stage 1 - is messily unbalanced and could probably be improved. Here's a shot at it. Note that, if F has an upper bound of 2^k, then F(a) * F(b) has an upper bound of 2^(2k). Suppose we build a layered structure as follows. Each layer consists of two temporary storage areas R1(L), R2(L) and one flag f(L). The temporary storage areas in the bottom layer need to be big enough to hold the biggest value we might b
Mersenne: Re: Trial Factoring: Back to the math
Just an observation, If you are talking about putting less than half of the candidate factors into the modular product, i.e. stopping when you reach the square root of Mp, then most of the time you will need to do the gcd. But if you put all filtered factors into the modular product, at some point it will become zero. That can even occur before you reach the square root if Mp has more than 2 factors. example: for p = 29, once you multiply by 2809 (k=36), your product has all three factors in it (k = 4,19,36), which is Mp itself, and Mp mod Mp is 0. So a crude brute-force method would be to accumulate factors until the product became zero (composite) or you run out of factors (prime). This requires no gcd. Personally, I think the gcd would be cheaper than the bignum multiplies of the large half of the factors. It would be more practical to determine some factoring limit similar to what the trial factoring does now. Actually, it would be smart to include a check for zero in any version of this method. For p = 29, its stops at k=36, when the root limit is k=199, and the max limit is k = 399. That saves 163 or 363 big multiplies. One problem is it only gives you that last factor - the one that flipped it to zero. You *could* store two products, then output: "p = 29 is composite: q = 2809, prod = 39320847" and do a gcd later. Regards, -Bruce _ Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
Mersenne: Re: Trial Factoring: Back to the math
Brian Beesley writes: >Umm. Can someone please reassure me that we're not re-inventing >P-1 stage 1? The algorithm under discussion does share some superficial features with p-1, in that we do a bunch of a big-integer modular multiplies followed by a gcd to extract any factors. But in fact we have complete control over the factors that will be found, by way of the candidates we throw into the accumulated product. So it really is equivalent to sieve-based trial factoring, the difference being in the way we test the candidates. >The other point here is that the GCD is _expensive_. On a 1GHz >P3 or Athlon the GCD at the end of P-1 on exponent ~10^7 takes ~ >10 minutes. Ah, but that's (as Rich Schroeppel pointed out) the beauty of modular arithmetic. If we only need to do a single gcd, who cares if it takes ten minutes or even a few hours? If it became clear that we could implement this algorithm and have it run within factor of no more than 3-4 times slower than one-factor-at-a-time sieving for factors <= 64 bits, it would be very promising indeed, since it suffers no large performance hit for factor candidates above 64 bits. And it would allow machines with poor integer multiply (e.g. the Sparc family) to join in the factoring work, since most of the work could use floating-point arithmetic, irrespective of the size of the factors. -Ernst
Re: Mersenne: Re: Trial Factoring: Back to the math
On 14 Feb 2002, at 20:00, [EMAIL PROTECTED] wrote: > Rich Schroeppel <[EMAIL PROTECTED]> writes: > > <<< > The cost analysis of trial factoring by GCDs of 2^P-1 with the > product of many small candidate divisors ignores an important > optimization: All the candidates can be multiplied together > mod 2^P-1, and ONLY ONE GCD NEEDS TO BE DONE. The major cost is > probably the multiplications. Reduction mod 2^P-1 is particularly > fast, but the same principle applies to random ultra-large targets. Umm. Can someone please reassure me that we're not re-inventing P-1 stage 1? The other point here is that the GCD is _expensive_. On a 1GHz P3 or Athlon the GCD at the end of P-1 on exponent ~10^7 takes ~ 10 minutes. In that time you can try ~ 10^9 63-bit factors, even ignoring those which fall through the sieve. Please don't let my scepticism put you off; you're quite right to say that advances only come from thinking the unthinkable; but, if there is significant mileage in this idea (and it isn't a straight rehash of P-1), then (in the words of HHGG) surprise will no longer be adequate, and I will be forced to turn to astonishment. Regards Brian Beesley _ Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
Mersenne: Re: Trial Factoring: Back to the math
Rich Schroeppel <[EMAIL PROTECTED]> writes: <<< The cost analysis of trial factoring by GCDs of 2^P-1 with the product of many small candidate divisors ignores an important optimization: All the candidates can be multiplied together mod 2^P-1, and ONLY ONE GCD NEEDS TO BE DONE. The major cost is probably the multiplications. Reduction mod 2^P-1 is particularly fast, but the same principle applies to random ultra-large targets. The products are probably best done by grouping pairs of trial divisors, then pairs of pairs, etc. The lowest levels can be chopped away if we use divisors in arithmetic progressions, which can be grouped by (say) eight at a time, and the products evaluated as an octic polynomial, which can be reduced to evaluating eight bignum additions per polynomial. The easiest way to develop the necessary finite difference parameters is to evaluate a few products directly, and calculate the differences. This dodges the nuisance of computing polynomial products and Stirling numbers. >>> An excellent point, Rich. So now (assuming we do >> 1 p-bit input products) the single gcd is negligible. Generating each p-bit input product needs a product of 2 p/2-bit terms, which were generated from 4 p/4-bit terms, etc. With log2(p) levels in this recursion, generating each input vector needs O(p * ((log(p))^2) work, which is asymptotically the same as for the gcd, but with a much smaller constant buried in the O() (Modular multiply of each new input vector with the accumulated modular product needs just a single p-bit multiply and O(p * log(p)) work, and thus is negligible compared to generating the input vector from the individual factor candidates themselves). Thus the asymptotic opcount remains a factor of log(p) times as great as for the one-candidate (or a few candidates) at a time method, but there are benefits to this method which may make it attractive, especially on hardware where we can do gcd very fast (e.g. using floating-point), and when the size of the candidate factors exceeds the size that lends itself to fast hardware multiply. Are you aware of anyone who has implemented an efficient version of such a method? It would be nice to have some idea of its speed in practice before spending the significant programming time it would need to code a (say) Mersenne-optimized version of it for various hardware targets. -Ernst
Mersenne: Re: Trial Factoring: Back to the math.
Bruce Leenstra wrote: > What this list needs right now is a nice juicy math debate, so here goes: > I was reading the faq about P-1 factoring, and it talks about constructing a > 'q' that is the product of all primes less than B1 (with some multiples?) > ... > > Right now Prime95 constructs a list of possible factors, filters it (before > hand, or on-the-fly) and then uses the powering algorithm to calculate 2^p - > 1 (mod q) for each factor. Sandy Harris <[EMAIL PROTECTED]> replied: >Off-the wall question dep't.: > >How does the overhead for that compare to just calculating q, the product of >all the primes of interest, and then gcd(q, target)? If you get 1, then q >and target have no common factors. If you get anything else, your result is >the product of the common factors. One of the advantages of testing each potential factor individually, or even in pairs, is you can stop when you find a factor. With the gcd method you are always testing a large number of factors. This would be great if you wanted to completely factor the composite mersennes, it probably _would_ be faster for that. You could study the distribution of factors, combine that with overhead costs and speed comparisons to come up with the optimum number of factors to test at a time. But if most of the factors are usually smaller, the gcd method would always end up being slower. Regards, Bruce Leenstra _ Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
Mersenne: Re: Trial Factoring: Back to the math.
Bruce Leenstra wrote: > What this list needs right now is a nice juicy math debate, so here goes: > I was reading the faq about P-1 factoring, and it talks about constructing a > 'q' that is the product of all primes less than B1 (with some multiples?) > ... > > Right now Prime95 constructs a list of possible factors, filters it (before > hand, or on-the-fly) and then uses the powering algorithm to calculate 2^p - > 1 (mod q) for each factor. Sandy Harris <[EMAIL PROTECTED]> replied: >Off-the wall question dep't.: > >How does the overhead for that compare to just calculating q, the product of >all the primes of interest, and then gcd(q, target)? If you get 1, then q >and target have no common factors. If you get anything else, your result is >the product of the common factors. Actually, that's not off the wall (or out of left field, as fans of baseball refer to it) at all. In fact, Richard Crandall mentions such a scheme in at least one of his books (either Pomerance & Crandall or Crandall's "Projects in Scientific Computing" or both.) Here's how it compares to the sieve-based factoring: assume we have many candidate factors of the number 2^p-1, having size <= q bits. (Precisely speaking, we require an O(1) fraction of these being O(q) in size, but that's a technicality.) Assuming q bits is on the order of a computer wordsize, to test each one of these separately requires O(log2(p)) CPU operations (i.e. the O(log2(p)) modular squarings of a binary powering algorithm.) Using the gcd-based method, we multiply roughly p/q of the factor candidates together to get a product of <= p bits, then gcd with 2^p-1. The fastest known gcd algorithms need O(p * ((log2(p))^2), i.e. are O(log2(p)) times as expensive as an FFT-based big-integer multiply of numbers this size. (That doesn't seem too bad at first glance, but the O() hides a much larger proportionality constant than the FFT, as well.) Dividing by the number of candidates that get processed per gcd we see that this method needs O((log2(p))^2) operations, where we've absorbed the q (which is assumed to be order of unity, since a computer wordsize is such) into the implied proportionality constant. That makes this method asymptotically slower than one-factor-candidate-at-a-time sieving, and the added overhead of the fast gcd imposes an additional speed penalty. Thus, this method (which does have the advantage that it can use floating-point FFT for the large-integer multiplies of the fast gcd) would only become attractive if integer multiply were grindingly slow or if one were dealing with some kind of vector hardware which could do floating-point FFT screamingly fast but were somehow constrained to being able to do just O(1) integer multiplies per cycle. Not a likely combination, but it's an interesting theme nonetheless. Note that one advantage of the gcd-based method is that factoring cost doesn't depend overmuch on the hardware wordsize, since we can break the factor product into whichever-sized chunks are convenient for the FFT. For the sieving method, as soon as q exceeds the hardware wordsize by even a single bit, we get hit with a 3-4x speed penalty. But given that the gcd method is probably around 100x slower than sieving for q no larger than a wordsize, it's going to take some very large factor candidates indeed to make the gcd method attractive. But keep those off-the-wallers coming - it's that kind of thinking that often leads to real algorithmic breakthroughs! Cheers, -Ernst