Re: Mersenne: Re: Trial Factoring: Back to the math

2002-02-17 Thread Justin Valcourt

> Me as well.  From what I understand the breakpoints to be, my K6-2 400
> would be put on first time LL's by default.  I tried it on DC's, but the
> FPU is so poor that it is really only suited to factoring.  My 466 Celeron
> could do LL's very slowly, but given the relative balance between LL's and
> DC's right now, I chose to put it on DC's.  Unsexy work highly unlikely to
> find a prime, and yet still needs to be done.

I have cached on my P-200 4 TF assignments that fall under the 2^66 bit
limit. This machine takes roughly one week to complete 2^65. I don't know
what I'm going to do with this machine after is starts crunching TF on 18M.
I did one in the past, it took nearly a month :(

But I refuse to retire this machine. Let it crunch away until the mobo
fries. :)

_
Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Re: Mersenne: Re: Trial Factoring: Back to the math

2002-02-17 Thread Mary Conner

On Sun, 17 Feb 2002 [EMAIL PROTECTED] wrote:
> Ah, but, factoring is actually moving ahead anyway in terms of 
> CPU years required for LL to catch up - because larger exponents 
> take more time to LL test. My impression is that the gap between 
> the head of the factoring assignment queue and the head of the LL 
> testing assignment queue has remained fairly steady at 4 to 5 
> million for at least three years.

And everytime we move up a bit level the time required to factor an
exponent goes up as well.  And the gap is only 3.7M right now, and
PrimeNet seems to be having that problem with large numbers of factoring
assignments not be handed out correctly.  I'm not meaning to be Chicken
Little here, I don't think there is any need for a "call to arms" on
factoring (since the balance of machines on each task can be somewhat
manipulated centrally).  I just wanted to correct the misperception that
factoring was going faster than LL.

> Another point, surely, is that factoring assignments "suddenly 
> slowed down" when we reached the changeover point between 65 
> bit and 66 bit trial factoring depth in the low 18 millions.

The cutoff is at 17.85M if I remember correctly.  I really only look at
"net assignments checked out" because that is an easy number to
eyeball.  That rate won't be affected by a jump to a new factoring depth
right away, but that will cause some slowdown later on as machines given
the bigger assignments are slower to return them and check out new
assignments.  The biggest change lately has been that LL assignments have
slowed down somewhat (and 33M assignments may have picked up, but those
are a lot more variable, and harder to eyeball a trend unless a record is
kept).  Factoring assignments have picked up somewhat recently, but I
don't think this is a permanent trend.  I suspect it is largely if not
entirely due to the challenge that one team is embarking upon, and that is
going to last only a month.

> Eh? The only PrimeNet problem I'm aware of with in this respect is 
> that it can't cope with having LL and DC assignments for the same 
> exponent out at the same time.

It may have been fixed, but in the archives I saw references to PrimeNet
not properly expiring and then handing out again first time LL's if they
were in a range that had been overrun by DC's.  If there is no PrimeNet
problem, then there is no problem with the DC range overrunning the LL
range at all.

> Well, some of us try - personally I have several systems which 
> would be running LL assignments if left to "makes most sense" 
> running DCs instead. 

Me as well.  From what I understand the breakpoints to be, my K6-2 400
would be put on first time LL's by default.  I tried it on DC's, but the
FPU is so poor that it is really only suited to factoring.  My 466 Celeron
could do LL's very slowly, but given the relative balance between LL's and
DC's right now, I chose to put it on DC's.  Unsexy work highly unlikely to
find a prime, and yet still needs to be done.

> The difficulty here (it's purely a matter of perception, not a technical 
> problem) is that, by pushing up the number of systems that get DC 
> assignments by default, we risk alienating new users who find that 
> they're not being given "exciting" assignments, and those with 
> slower systems who find that even DC assignments are taking well 
> over a month.

I don't think there is really anything wrong with the DC breakpoint as it
is right now.  The spread between leading and trailing edge is not
excessive, and the leading edge is not that far behind the trailing edge
of LL.  It's pretty well balanced.

And the point about "exciting" assignments is well taken.  There was a
surge of new users after M39 was discovered, and of those who did not
stick around (and let their assignments expire), the vast majority were
doing first time LL.  Most likely disappointed in how long it takes to
complete one and not willing to do less sexy but less time consuming
work.  But in spite of those who dropped out, it is evident that GIMPS
picked up quite a few users who have stuck it out, so the expiries from
those dropping out is just the cost of recruiting new users.

There's also the issue of all the myriad things that can happen between
the time an assignment is started and the time it ends.  Today I went to
help my father set up his new 800MHz Duron, and I put Prime95 on
it.  Despite the fact that it is quite capable of doing LL's, I put it on
factoring.  His history with screwing up his machine (though I still love
him, bless his heart) doesn't bode well for anything but the shortest of
assignments.  Longer assignments just don't survive those who tend to blow
away their boxes every now and then.

Which leads me to something else that might benefit factoring.  Factoring
doesn't have huge save files like LL does.  They're only 32 bytes.  I'd
suggest that anytime a box makes an update of expected completion dates,
that it send along that saved information.  If the assign

RE: Assignment balancing (was: Re: Mersenne: Re: Trial Factoring: Back to the math)

2002-02-17 Thread Aaron Blosser

Another thing people might want to consider...

If you have a dual processor box, have one of the prime threads doing
factoring while the other does LL tests.

I've been meaning to do that for a while now, and I guess I might get
around to that next week.  I think most of the machines I have testing
are dual processor, so that should increase my factoring output at
least. :)

I've also got my slower machines (under 500MHz or so) already doing
factoring... goes back to something I've always said: pick the right
assignments for the right machines. :)

Aaron

> -Original Message-
> From: [EMAIL PROTECTED]
[mailto:mersenne-invalid-
> [EMAIL PROTECTED]] On Behalf Of [EMAIL PROTECTED]
> Sent: Sunday, February 17, 2002 2:55 PM
> To: Russel Brooks
> Cc: [EMAIL PROTECTED]
> Subject: Assignment balancing (was: Re: Mersenne: Re: Trial Factoring:
> Back to the math)
> 
> On 17 Feb 2002, at 17:54, Russel Brooks wrote:
> 
> > Am I interpreting this thread correctly?  That more factoring is
> > needed?  My climb up the LL top producers is starting to stall
> > so maybe it's time to switch to factoring.
> 
> I'm quite sure there's no need to panic!
> 
> So far as I'm concerned, I've been underdoing factoring - less than
> 5% of my contribution. Theoretically the best balance for the
> project as a whole is about 10% of the CPU effort in trial factoring.
> 
> The other point here is that we have had two major improvements in
> the efficiency of the LL testing algorithm since the last time trial
> factoring was seriously looked at, so some theoretical work on trial
> factoring is probably due.
> 
> Why not try some factoring assignments, if _you_ think it will be
> fun for a change? You can always change back when you get
> bored...


_
Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Assignment balancing (was: Re: Mersenne: Re: Trial Factoring: Back to the math)

2002-02-17 Thread bjb

On 17 Feb 2002, at 17:54, Russel Brooks wrote:

> Am I interpreting this thread correctly?  That more factoring is
> needed?  My climb up the LL top producers is starting to stall
> so maybe it's time to switch to factoring.

I'm quite sure there's no need to panic!

So far as I'm concerned, I've been underdoing factoring - less than 
5% of my contribution. Theoretically the best balance for the 
project as a whole is about 10% of the CPU effort in trial factoring.

The other point here is that we have had two major improvements in 
the efficiency of the LL testing algorithm since the last time trial 
factoring was seriously looked at, so some theoretical work on trial 
factoring is probably due.

Why not try some factoring assignments, if _you_ think it will be 
fun for a change? You can always change back when you get 
bored...
> 
> I know the LL tests drive the FPU hard, what about factoring?

Factoring is a light PFU and memory load. LL (and DC) 
assignments are heavy on both.

Indications to run factoring irrespective of CPU speed:

1) One CPU on a dual processor system. Both processors running 
LL tests hits the memory bus too hard & ruins performance.
2) Some processors (AMD K6, and especially Cyrix 6x6) have 
inefficient FPUs; it is much more effective to have these processors 
running trial factoring.
3) A PIII or Athlon system running very close to the limit of the 
cooling system. Because the load on the FPU imposed by 
factoring is much lower, the power consumption of the CPU will 
reduce if factoring is run instead of LL testing, and the temperature 
will therefore drop significantly.
4) System very short of main memory. Trial factoring uses very 
little memory for data work space.

Counterindications:

It doesn't seem to be sensible to be running trial factoring on a P4. 
The P4 architecture is _very_ efficient at running LL tests, but the 
performance on the instruction mix used in trial factoring is a great 
deal worse (clock for clock) than PII/PIII/Athlon.

Regards
Brian Beesley
_
Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Re: Mersenne: Re: Trial Factoring: Back to the math

2002-02-17 Thread Russel Brooks

Am I interpreting this thread correctly?  That more factoring is
needed?  My climb up the LL top producers is starting to stall
so maybe it's time to switch to factoring.

I know the LL tests drive the FPU hard, what about factoring?

Cheers... Russ

_
Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Re: Mersenne: Re: Trial Factoring: Back to the math

2002-02-17 Thread bjb

On 16 Feb 2002, at 12:26, Mary Conner wrote:

> Trial factoring is well ahead of LL testing, but the gap is closing.  
> Yesterday was the first day in a long time where the net number of
> checkouts for factoring exceeded those for first time LL's.  That is due
> to the fact that one team is having a factoring challenge among their
> membership due to begin soon, and large numbers of factoring assignments
> are being checked out in preparation.  It isn't a trend that is going to
> persist.  Ongoing, more factoring assignments have to be done than LL
> assignments because factoring eliminates a substantial number of exponents
> from LL testing.

Ah, but, factoring is actually moving ahead anyway in terms of 
CPU years required for LL to catch up - because larger exponents 
take more time to LL test. My impression is that the gap between 
the head of the factoring assignment queue and the head of the LL 
testing assignment queue has remained fairly steady at 4 to 5 
million for at least three years.

Nevertheless I've just tripled my factoring effort from a single PIII-
500 by switching one CPU on a dual processor PIII-1000 system 
from ECM to factoring. The other CPU is of course running LL tests.
> 
> The situation has improved recently from two earlier surges in LL testing.  
> One accompanied the discovery of M39, the other was a post-Christmas surge
> (GIMPS participants with shiny new fast computers replacing slower ones,
> presumably).  Many of those who joined up after M39 have dropped out, and
> the crest of the wave of their expiries has passed (over 150 assignments
> started on Nov. 14th expired in one day, for instance).  For awhile, the
> balance was around 300 net LL's checked out to 150 factoring assignments.  
> That has narrowed to around 250 to 200, and sometimes comes fairly even,
> although to keep up, GIMPS needs more factoring assignments done than
> LL's.

Another point, surely, is that factoring assignments "suddenly 
slowed down" when we reached the changeover point between 65 
bit and 66 bit trial factoring depth in the low 18 millions.
> 
> The cutoffs for those computers set to "do the work that makes the most
> sense" can be adjusted to put more computers on trial factoring work if LL
> testing starts to overrun trial factoring.  The cutoffs could also move
> some computers doing first time LL onto doing double checks, although too
> much would have the leading edge of DC's running into the trailing edge of
> LL's, and I understand that PrimeNet has difficulty with that situation.  

Eh? The only PrimeNet problem I'm aware of with in this respect is 
that it can't cope with having LL and DC assignments for the same 
exponent out at the same time.

> Right now DC checkouts are going very slowly, less than 100 a day most
> days.

Well, some of us try - personally I have several systems which 
would be running LL assignments if left to "makes most sense" 
running DCs instead. 

The difficulty here (it's purely a matter of perception, not a technical 
problem) is that, by pushing up the number of systems that get DC 
assignments by default, we risk alienating new users who find that 
they're not being given "exciting" assignments, and those with 
slower systems who find that even DC assignments are taking well 
over a month.
> 
> Something that would likely help a lot would be to add factoring
> capability and PrimeNet access code to those programs that don't yet have
> it so that more computers can reasonably participate.   

Yes. There are also a lot of non-Intel un*x systems out there, most 
of which would be happier with DC than LL assignments, that we 
could probably attract if we had an automatic interface for obtaining 
assignments from PrimeNet and submitting results of completed 
tests. At least two reasonably efficient and reliable clients (Mlucas 
and Glucas) are already available for a good range of host 
systems; I'm sure the perceived complications of using the manual 
assignment forms are responsible for at least some users not 
participating.

Regards
Brian Beesley
_
Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Re: Mersenne: Re: Trial Factoring: Back to the math

2002-02-16 Thread bjb

On 16 Feb 2002, at 15:42, George Woltman wrote:

> To put it in perspective, let's say you are looking for 64 bit factors
> of a 10,000,000 bit number.
> 
> The basic algorithm is:
>  Multiply 156,250 trial factors together to form a 10,000,000 bit 
> number.
>  do {
>  Multiply next 156,250 trial factors making 2nd 10M bit number.
>  Multiply the two 10,000,000 bit numbers together mod 2^P-1
>  } while there are more 64-bit factors
>  GCD (10,000,000 bit product, 2^P-1)

My suggested modification groups things a bit differently: by and 
large we're working in powers of 2, so the first step of the inner loop 
would go something like

Multiply 131072 64-bit numbers together in pairs
Multiply the resulting 65536 128-bit products together in pairs
...
Multiply the resulting four 4194404-bit products together in pairs
Multiply the resulting two 8388608-bit products together modulo 
2^p-1

Repeat the above for the next block of 131072 64-bit numbers

Multiply the resulting two 10M-bit products together and reduce 
modulo 2^P-1

etc. etc.

So for each block of 131072 numbers we have
65536 64-bit multiplications
32768 128-bit multiplications
16384 256-bit multiplications
8192 512-bit multiplications
4096 1024-bit multiplications
2048 2048-bit multiplications
1024 8192-bit multiplications
512 16384-bit multiplications
256 32768-bit multiplications
128 65536-bit multiplications
64 131072-bit multiplications
32 262144-bit multiplications
16 524288-bit multiplications
8 1048576-bit multiplications
4 2097152-bit multiplications
2 4194304-bit multiplications
and one 8388608 x 10M bit multiplication modulo 2^p-1

There seems to be about the same amount of work in each level, 
nevertheless the lower levels should be faster since even FFT is 
O(n log n) and we can probably do better than FFT for the very 
smallest products.
> 
> Looking at the inner loop the second step uses the fast LL multiply code.
> On my P4-1400 this takes 0.06 seconds or 84 million clocks or (dividing
> by 156,250) 538 clocks per trial factor.  This is quite promising.  I haven't
> measured it but the current code can test one trial factor in about 4000
> clocks.

My scheme has 17 levels so the upper bound of the _average_ 
time per trial factor should be 17 * 538 clocks. This is too darned 
slow but, as I indicated above, the actual execution time should be 
significantly less than that.

But I thought the time to actually test a 64-bit factor - not counting 
sieving overheads - was less than 2000 clocks ??? 
> 
> The stumbling block is how can we quickly do the first step of the inner
> loop - multiplying 156,250 bit trial factors together.  If someone has a
> clever algorithm that can do this in fewer than 3500 clocks per trial factor
> then we may have an improved factoring algorithm!

The other point I was trying to make is that a similar algorithm 
could be used to compute the product of small primes used by P-1 
stage 1 (and maybe stage 2 as well) with a highly significant 
improvement over the current method. There may also be an 
application in ECM.

Regards
Brian Beesley
Regards
Brian Beesley
_
Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Re: Mersenne: Re: Trial Factoring: Back to the math

2002-02-16 Thread George Woltman

At 05:18 PM 2/16/2002 +, [EMAIL PROTECTED] wrote:
>So the viability of the new algorithm depends on whether we can
>compute P*F (mod 2^p-1) faster than we can compute 2^p (mod F).

To put it in perspective, let's say you are looking for 64 bit factors
of a 10,000,000 bit number.

The basic algorithm is:
 Multiply 156,250 trial factors together to form a 10,000,000 bit 
number.
 do {
 Multiply next 156,250 trial factors making 2nd 10M bit number.
 Multiply the two 10,000,000 bit numbers together mod 2^P-1
 } while there are more 64-bit factors
 GCD (10,000,000 bit product, 2^P-1)

Looking at the inner loop the second step uses the fast LL multiply code.
On my P4-1400 this takes 0.06 seconds or 84 million clocks or (dividing
by 156,250) 538 clocks per trial factor.  This is quite promising.  I haven't
measured it but the current code can test one trial factor in about 4000
clocks.

The stumbling block is how can we quickly do the first step of the inner
loop - multiplying 156,250 bit trial factors together.  If someone has a
clever algorithm that can do this in fewer than 3500 clocks per trial factor
then we may have an improved factoring algorithm!


_
Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Re: Mersenne: Re: Trial Factoring: Back to the math

2002-02-16 Thread Mary Conner



On Sat, 16 Feb 2002 [EMAIL PROTECTED] wrote:
> Is this _really_ a problem, or is it likely to become one in the 
> forseeable future? Trial factoring is well ahead of LL testing at 
> present, and seems to be still pulling further ahead, despite the 
> improvement in the relative efficiency of LL testing vs. trial factoring 
> with fairly widespread deployment of P4 systems.

Trial factoring is well ahead of LL testing, but the gap is closing.  
Yesterday was the first day in a long time where the net number of
checkouts for factoring exceeded those for first time LL's.  That is due
to the fact that one team is having a factoring challenge among their
membership due to begin soon, and large numbers of factoring assignments
are being checked out in preparation.  It isn't a trend that is going to
persist.  Ongoing, more factoring assignments have to be done than LL
assignments because factoring eliminates a substantial number of exponents
from LL testing.

The situation has improved recently from two earlier surges in LL testing.  
One accompanied the discovery of M39, the other was a post-Christmas surge
(GIMPS participants with shiny new fast computers replacing slower ones,
presumably).  Many of those who joined up after M39 have dropped out, and
the crest of the wave of their expiries has passed (over 150 assignments
started on Nov. 14th expired in one day, for instance).  For awhile, the
balance was around 300 net LL's checked out to 150 factoring assignments.  
That has narrowed to around 250 to 200, and sometimes comes fairly even,
although to keep up, GIMPS needs more factoring assignments done than
LL's.

> I would have thought that, in the event that LL testing did start to 
> catch up to trial factoring, the first thing to do would be to reduce 
> the initial trial factoring depth by 1, then run P-1 and only then 
> proceed to do the last bit of trial factoring. In fact on P4 systems it 
> might be more economical to stop trial factoring sooner than that.

The cutoffs for those computers set to "do the work that makes the most
sense" can be adjusted to put more computers on trial factoring work if LL
testing starts to overrun trial factoring.  The cutoffs could also move
some computers doing first time LL onto doing double checks, although too
much would have the leading edge of DC's running into the trailing edge of
LL's, and I understand that PrimeNet has difficulty with that situation.  
Right now DC checkouts are going very slowly, less than 100 a day most
days.

Something that would likely help a lot would be to add factoring
capability and PrimeNet access code to those programs that don't yet have
it so that more computers can reasonably participate.   

_
Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Re: Mersenne: Re: Trial Factoring: Back to the math

2002-02-16 Thread bjb

On 15 Feb 2002, at 18:35, [EMAIL PROTECTED] wrote:

> The algorithm under discussion does share some superficial
> features with p-1, in that we do a bunch of a big-integer
> modular multiplies followed by a gcd to extract any factors.
> But in fact we have complete control over the factors that
> will be found, by way of the candidates we throw into the
> accumulated product. So it really is equivalent to sieve-based
> trial factoring, the difference being in the way we test the
> candidates.

Ah, now I see clearly what's going on.
> 
> >The other point here is that the GCD is _expensive_. On a 1GHz 
> >P3 or Athlon the GCD at the end of P-1 on exponent ~10^7 takes ~ 
> >10 minutes.
> 
> Ah, but that's (as Rich Schroeppel pointed out) the beauty of
> modular arithmetic. If we only need to do a single gcd, who cares
> if it takes ten minutes or even a few hours?

All right, let's assume that the cost of the GCD is zero, or as close 
to zero as makes no difference.

The difference between the new algorithm and "traditional" trial 
factoring is that, for each possible factor F that survives being 
sieved out, we calculate P(n+1) = (P(n)*F) (mod 2^p-1) and hope 
that, when we have incorporated sufficient possible factors, we find 
GCD(P(N),2^p-1) > 1 whereas the "traditional" method simply 
calculates 2^p (mod F) for each F in turn hoping to find the result 1 
indicating that we have found a factor.

So the viability of the new algorithm depends on whether we can 
compute P*F (mod 2^p-1) faster than we can compute 2^p (mod F).
The efficiency of the sieving is irrelevant, because if we can 
efficiently sieve out factors for one algorithm, we can use exactly 
the same sieving technique with the other algorithm.

When the average value of P is 2^(p-1), F ~ 2^64 and p ~ 10^7 this 
looks (to say the least) rather improbable. (Though see later in this 
message ...) The point here is that running P-1 stage 1 with 
B1=10^6 takes a while, but (a) there are in general going to be a lot 
more non-trivially-sieved-out factor candidates between e.g. 2^64 
and 2^65 than there are primes < 10^6, and (b) each multiplication 
certainly won't be faster - in fact we will probably be multiplying by 
a double- or triple-length scalar quantity, whereas 10^6 fits very 
easily into a single word.

Now, if we could find a "weak pseudoprimality" test - needs to be 
_much_ faster than Fermat's Test, since computing 3^(f-1) (mod f) 
is going to take longer than computing 2^p (mod f) - we could use 
this to enhance the screening of factors & so speed up trial 
factoring (_any_ implementation of it). Might there be mileage is 
_this_ idea? The test need not be very accurate, if it's fast enough; 
certainly we can afford to take a lot more "false positives" than 
Fermat's Test lets through.
> 
> If it became clear that we could implement this algorithm and
> have it run within factor of no more than 3-4 times slower than
> one-factor-at-a-time sieving for factors <= 64 bits, it would be
> very promising indeed, since it suffers no large performance
> hit for factor candidates above 64 bits.

I still don't see this clearly. Surely there is going to be a "large 
performance hit" when multiplying a very large integer by a scalar 
when the scalar jumps in size from N to N+1 computer words, and 
N is a small number (2 for 32-bit architectures, 1 for 64-bit 
architectures).

> And it would allow
> machines with poor integer multiply (e.g. the Sparc family)
> to join in the factoring work, since most of the work could
> use floating-point arithmetic, irrespective of the size of
> the factors.

Is this _really_ a problem, or is it likely to become one in the 
forseeable future? Trial factoring is well ahead of LL testing at 
present, and seems to be still pulling further ahead, despite the 
improvement in the relative efficiency of LL testing vs. trial factoring 
with fairly widespread deployment of P4 systems.

I would have thought that, in the event that LL testing did start to 
catch up to trial factoring, the first thing to do would be to reduce 
the initial trial factoring depth by 1, then run P-1 and only then 
proceed to do the last bit of trial factoring. In fact on P4 systems it 
might be more economical to stop trial factoring sooner than that.

...

Enough sceptisicm; here's a pointer to how thought triggered by 
this discussion could help improve the efficiency of other factoring 
methods:

It occurred to me that the operation (P * F) (mod 2^p-1) implicit in 
this proposal - and P-1 stage 1 - is messily unbalanced and could 
probably be improved. Here's a shot at it.

Note that, if F has an upper bound of 2^k, then F(a) * F(b) has an 
upper bound of 2^(2k).

Suppose we build a layered structure as follows. Each layer 
consists of two temporary storage areas R1(L), R2(L) and one flag 
f(L). The temporary storage areas in the bottom layer need to be 
big enough to hold the biggest value we might b

Mersenne: Re: Trial Factoring: Back to the math

2002-02-15 Thread Bruce Leenstra

Just an observation,

If you are talking about putting less than half of the candidate factors
into the modular product, i.e. stopping when you reach the square root of
Mp, then most of the time you will need to do the gcd. But if you put all
filtered factors into the modular product, at some point it will become
zero. That can even occur before you reach the square root if Mp has more
than 2 factors.

example: for p = 29, once you multiply by 2809 (k=36), your product has all
three factors in it (k = 4,19,36), which is Mp itself, and Mp mod Mp is 0.

So a crude brute-force method would be to accumulate factors until the
product became zero (composite) or you run out of factors (prime).
This requires no gcd. Personally, I think the gcd would be cheaper than the
bignum multiplies of the large half of the factors.
It would be more practical to determine some factoring limit similar to what
the trial factoring does now.

Actually, it would be smart to include a check for zero in any version of
this method. For p = 29, its stops at k=36, when the root limit is k=199,
and the max limit is k = 399. That saves 163 or 363 big multiplies.

One problem is it only gives you that last factor - the one that flipped it
to zero. You *could* store two products, then output:
"p  = 29 is composite: q = 2809, prod = 39320847"  and do a gcd
later.



Regards,
-Bruce

_
Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Mersenne: Re: Trial Factoring: Back to the math

2002-02-15 Thread EWMAYER
Brian Beesley writes:

>Umm. Can someone please reassure me that we're not re-inventing 
>P-1 stage 1?

The algorithm under discussion does share some superficial
features with p-1, in that we do a bunch of a big-integer
modular multiplies followed by a gcd to extract any factors.
But in fact we have complete control over the factors that
will be found, by way of the candidates we throw into the
accumulated product. So it really is equivalent to sieve-based
trial factoring, the difference being in the way we test the
candidates.

>The other point here is that the GCD is _expensive_. On a 1GHz 
>P3 or Athlon the GCD at the end of P-1 on exponent ~10^7 takes ~ 
>10 minutes.

Ah, but that's (as Rich Schroeppel pointed out) the beauty of
modular arithmetic. If we only need to do a single gcd, who cares
if it takes ten minutes or even a few hours?

If it became clear that we could implement this algorithm and
have it run within factor of no more than 3-4 times slower than
one-factor-at-a-time sieving for factors <= 64 bits, it would be
very promising indeed, since it suffers no large performance
hit for factor candidates above 64 bits. And it would allow
machines with poor integer multiply (e.g. the Sparc family)
to join in the factoring work, since most of the work could
use floating-point arithmetic, irrespective of the size of
the factors.

-Ernst


Re: Mersenne: Re: Trial Factoring: Back to the math

2002-02-15 Thread bjb

On 14 Feb 2002, at 20:00, [EMAIL PROTECTED] wrote:

> Rich Schroeppel <[EMAIL PROTECTED]> writes:
> 
> <<<
> The cost analysis of trial factoring by GCDs of 2^P-1 with the
> product of many small candidate divisors ignores an important
> optimization: All the candidates can be multiplied together
> mod 2^P-1, and ONLY ONE GCD NEEDS TO BE DONE. The major cost is
> probably the multiplications. Reduction mod 2^P-1 is particularly
> fast, but the same principle applies to random ultra-large targets.

Umm. Can someone please reassure me that we're not re-inventing 
P-1 stage 1?

The other point here is that the GCD is _expensive_. On a 1GHz 
P3 or Athlon the GCD at the end of P-1 on exponent ~10^7 takes ~ 
10 minutes. In that time you can try ~ 10^9 63-bit factors, even 
ignoring those which fall through the sieve.

Please don't let my scepticism put you off; you're quite right to say 
that advances only come from thinking the unthinkable; but, if there 
is significant mileage in this idea (and it isn't a straight rehash of 
P-1), then (in the words of HHGG) surprise will no longer be 
adequate, and I will be forced to turn to astonishment.

Regards
Brian Beesley
_
Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Mersenne: Re: Trial Factoring: Back to the math

2002-02-14 Thread EWMAYER
Rich Schroeppel <[EMAIL PROTECTED]> writes:

<<<
The cost analysis of trial factoring by GCDs of 2^P-1 with the
product of many small candidate divisors ignores an important
optimization: All the candidates can be multiplied together
mod 2^P-1, and ONLY ONE GCD NEEDS TO BE DONE. The major cost is
probably the multiplications. Reduction mod 2^P-1 is particularly
fast, but the same principle applies to random ultra-large targets.

The products are probably best done by grouping pairs of trial
divisors, then pairs of pairs, etc. The lowest levels can be
chopped away if we use divisors in arithmetic progressions, which
can be grouped by (say) eight at a time, and the products evaluated
as an octic polynomial, which can be reduced to evaluating eight
bignum additions per polynomial. The easiest way to develop the
necessary finite difference parameters is to evaluate a few
products directly, and calculate the differences. This dodges
the nuisance of computing polynomial products and Stirling numbers.
>>>

An excellent point, Rich. So now (assuming we do >> 1 p-bit input
products) the single gcd is negligible. Generating each p-bit input
product needs a product of 2 p/2-bit terms, which were generated
from 4 p/4-bit terms, etc. With log2(p) levels in this recursion,
generating each input vector needs O(p * ((log(p))^2) work, which
is asymptotically the same as for the gcd, but with a much smaller
constant buried in the O() (Modular multiply of each new input
vector with the accumulated modular product needs just a single
p-bit multiply and O(p * log(p)) work, and thus is negligible
compared to generating the input vector from the individual factor
candidates themselves). Thus the asymptotic opcount remains
a factor of log(p) times as great as for the one-candidate (or a
few candidates) at a time method, but there are benefits to this
method which may make it attractive, especially on hardware where
we can do gcd very fast (e.g. using floating-point), and when the
size of the candidate factors exceeds the size that lends itself to
fast hardware multiply.

Are you aware of anyone who has implemented an efficient version of
such a method? It would be nice to have some idea of its speed in
practice before spending the significant programming time it would
need to code a (say) Mersenne-optimized version of it for various
hardware targets.

-Ernst


Mersenne: Re: Trial Factoring: Back to the math.

2002-02-12 Thread Bruce Leenstra

Bruce Leenstra wrote:

> What this list needs right now is a nice juicy math debate, so here goes:
> I was reading the faq about P-1 factoring, and it talks about constructing
a
> 'q' that is the product of all primes less than B1 (with some multiples?)
> ...
>
> Right now Prime95 constructs a list of possible factors, filters it
(before
> hand, or on-the-fly) and then uses the powering algorithm to calculate 2^p
-
> 1 (mod q) for each factor.

Sandy Harris <[EMAIL PROTECTED]> replied:

>Off-the wall question dep't.: 
>
>How does the overhead for that compare to just calculating q, the product
of
>all the primes of interest, and then gcd(q, target)? If you get 1, then q 
>and target have no common factors. If you get anything else, your result is
>the product of the common factors.

One of the advantages of testing each potential factor individually, or even
in pairs, is you can stop when you find a factor. With the gcd method you
are always testing a large number of factors. This would be great if you
wanted to completely factor the composite mersennes, it probably _would_ be
faster for that. 
You could study the distribution of factors, combine that with overhead
costs and speed comparisons to come up with the optimum number of factors to
test at a time. But if most of the factors are usually smaller, the gcd
method would always end up being slower.

Regards,
Bruce Leenstra

_
Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Mersenne: Re: Trial Factoring: Back to the math.

2002-02-12 Thread EWMAYER
Bruce Leenstra wrote:

> What this list needs right now is a nice juicy math debate, so here goes:
> I was reading the faq about P-1 factoring, and it talks about constructing a
> 'q' that is the product of all primes less than B1 (with some multiples?)
> ...
>
> Right now Prime95 constructs a list of possible factors, filters it (before
> hand, or on-the-fly) and then uses the powering algorithm to calculate 2^p -
> 1 (mod q) for each factor.

Sandy Harris <[EMAIL PROTECTED]> replied:

>Off-the wall question dep't.: 
>
>How does the overhead for that compare to just calculating q, the product of
>all the primes of interest, and then gcd(q, target)? If you get 1, then q 
>and target have no common factors. If you get anything else, your result is
>the product of the common factors.

Actually, that's not off the wall (or out of left field, as fans
of baseball refer to it) at all. In fact, Richard Crandall mentions
such a scheme in at least one of his books (either Pomerance & Crandall
or Crandall's "Projects in Scientific Computing" or both.) Here's how
it compares to the sieve-based factoring: assume we have many candidate
factors of the number 2^p-1, having size <= q bits. (Precisely speaking,
we require an O(1) fraction of these being O(q) in size, but that's a
technicality.) Assuming q bits is on the order of a computer wordsize,
to test each one of these separately requires O(log2(p)) CPU operations
(i.e. the O(log2(p)) modular squarings of a binary powering algorithm.)

Using the gcd-based method, we multiply roughly p/q of the factor
candidates together to get a product of <= p bits, then gcd with 2^p-1.
The fastest known gcd algorithms need O(p * ((log2(p))^2),
i.e. are O(log2(p)) times as expensive as an FFT-based big-integer
multiply of numbers this size. (That doesn't seem too bad at first
glance, but the O() hides a much larger proportionality constant than
the FFT, as well.) Dividing by the number of candidates that get
processed per gcd we see that this method needs O((log2(p))^2)
operations, where we've absorbed the q (which is assumed to be
order of unity, since a computer wordsize is such) into the implied
proportionality constant. That makes this method asymptotically
slower than one-factor-candidate-at-a-time sieving, and the added
overhead of the fast gcd imposes an additional speed penalty.

Thus, this method (which does have the advantage that it can use
floating-point FFT for the large-integer multiplies of the fast gcd)
would only become attractive if integer multiply were grindingly slow
or if one were dealing with some kind of vector hardware which could
do floating-point FFT screamingly fast but were somehow constrained
to being able to do just O(1) integer multiplies per cycle. Not a likely
combination, but it's an interesting theme nonetheless.

Note that one advantage of the gcd-based method is that factoring cost
doesn't depend overmuch on the hardware wordsize, since we can break
the factor product into whichever-sized chunks are convenient for the
FFT. For the sieving method, as soon as q exceeds the hardware wordsize
by even a single bit, we get hit with a 3-4x speed penalty. But given
that the gcd method is probably around 100x slower than sieving for
q no larger than a wordsize, it's going to take some very large factor
candidates indeed to make the gcd method attractive.

But keep those off-the-wallers coming - it's that kind of thinking
that often leads to real algorithmic breakthroughs!

Cheers,
-Ernst