On Fri, 20 Nov 1998 16:34:27 -0500, you wrote:

>Hi all,
>
>       Since this list had a few off-topic posts last week, I thought
>I'd post something that could be useful.
>
>       I'd like to challenge the readers to produce the best algorithm
>and/or code for finding factors of Mersenne numbers greater than 64-bits.
>You do not need to be a programmer to participate.
>
>
>Background
>----------
>
>       As many of you know the Intel chip supports floating point
>numbers with 64-bits of precision.  This allows prime95 to
>use very simple algorithms to compute 2^p mod f where f is a 64-bit
>number.  It uses a powering algorithm, very similar to the one you will find
>in Knuth.  Simply put, prime95 starts with the value of two and loops over
>each bit of p.  The loop computes value^2 mod f or 2*value^2 mod f
>depending of whether the bit in p is even or odd.
>       Prime95 needs new code to compute value^2 mod f where value
>and f are larger than 64 bits.

Everything I'm posting here is from Koc, C.K. et. al., Analyzing and
Comparing Montgomery Multiplication Algorithms, IEEE Micro, V16No3,
June 1996, pp26-33.  The paper discusses a number of ways to
calculate a^p mod n, where n is large.  Obviously, 2^p mod n is a
subset of this.  A quick summary :

MonExp (a, p, n)
{
    a' = ar mod n
    x' = r  mod n
    for i = (bits in p) - 1 to 0
    {
        x' = MonProd(x', x')
        if (bit i in p is 1)
            x' = MonProd (x', a')
    }
    return MonProd (x', 1)  // Transform from x' to x?
}

Basically, this is the algorithm George described above, with a
little tweaking of the initial values.  This fiddling makes the
MonProd function easier to compute:

MonProd (a', b')
{
    t = a' b'
    u = (t + (tn' mod r)n) / r
    if (n >= n)
        return u-n
    else
        return u
}

r is 2^(bits in p), so the mod r and divide by r are simple.  n' is
calculated once for the entire MonExp() loop, so it shouldn't be too
bad either (depending on how many times MonProd() will be called).

The reason I'm posting this is because the paper shows some fairly
good ways of calculating MonProd().  This isn't exactly the question
that was posed, but I'm wondering if the overhead involved with
computing a', x' and n' outweighs the (relative) ease of calculating
MonProd().  Or can this algorithm be adapted to the more general ab
mod n question (I'm guessing no, but that's really just a guess)?

The numbers they give, assuming a 32 bit word and an instruction to
multiply 2 32-bit words:  21 multiplies, 50 adds, 77 reads, 34
writes.  They define read and write as any reference of a variable,
though, so I'm hoping that there wouldn't be this many loads and
stores if we are smart about register allocation.  Also, since we
are really doing MonProd (a', a'), there should be some additional
room for improvement (I hope).  The algorithm (I don't pretend to
understand the math, I'm just an engineer) :

s is number of words = ceil(80/32) -> 3.  
C and S are 32 bit temporary variables (carry and sum, I suppose)
t[] is a scratch register initialized to 0s.
m is a 32 bit temp

for i = 0 to s-1
{
    C = 0                              
    for j = 0 to s-1
    {
       (C,S) = t[j] + a[j]b[i] + C
       t[j] = S                        
    }
    (C,S) = t[s] + C
    t[s]  = S
    t[s+1]= C
    
    m = t[0]n'[0] mod W     (W = 2^32, so the mod W should be free)
    (C,S) = t[0] + mn[0]
    for j = 1 to s-1
    {
        (C,S) = t[j] + mn[j] + C
        t[j-1]= S
    }
    (C,S) = t[s] + C
    t[s-1]= S
    t[s]  = t[s+1] + C
}
Result is in t[0] to t[s], subtract n if needed.

For some quick optimizations, unroll all the loops.  This gets rid
of a number of adds where C is 0, and a few more where t[i] is
initially 0.  Also, remember that a[j]b[i] is really a[j]a[i] ---
would it be worth it to save 3 multiplications by saving a[j]a[i]
and re-using them?

It's too late for me to start putting register names and opcodes
into the code, but hopefully this gives people a few ideas.  
[snip]



-- 
Kevin Jaget (or an FDA approved generic equivalent)
kcjaget at mindspring.com      

Reply via email to