Re: Mersenne: Fibonacci Series

1999-12-20 Thread Bill Daly

François Perruchaud wrote:

An old book of mine gives without proof an example of Fibonacci Sequence
that countains no primes, but where U(1) and U(2) are co-prime.
The sequence was found by R. L. Graham.
Reference :
"A Fibonacci-like sequence of composite numbers",
R.L. Graham, Math. Mag. 37, 1964.

U(1) = 1786772701928802632268715130455793
U(2) = 1059683225053915111058165141686995
U(N+2) = U(N+1) + U(N)

I only verified with Mapple that U(1) and U(2) are co-prime
and that U(N) is composite for N10.

Unfortunately, Graham made a computational error in calculating U(1) and
U(2), and the above values don't work as desired. The following will
correct the error:

 U(1) =  331635635998274737472200656430763
 U(2) = 1510028911088401971189590305498785
 U(N+2) = U(N+1) + U(N)

The trick is to create a "covering" of the indices, i.e. a set of
congruences such that every index satisfies at least one of the
congruences. The best way to understand this is by this example. Note
that F(4) is divisible by 3, and that every 4th term in the Fibonacci
sequence is also divisible by 3, i.e. the period of 3 is equal to 4.
Using the U() series instead of the F() series, it can be verified that
U(4n+2) is divisible by 3 for all n. Similarly,

 U(8n+4)   is divisible by7
 U(16n+8)  is divisible by   47
 U(32n+16) is divisible by 2207
 U(64n+32) is divisible by 1087
 U(64n)is divisible by 4481

This assures that U(2n) is composite for all n, since every even number
is either 2 mod 4, 4 mod 8, 8 mod 16, 16 mod 32, 32 mod 64, or 0 mod 64.
Note that F(8)/F(4) = 7, F(16)/F(8) = 47, F(32)/F(16) = 2207,
F(64)/F(32) = 1087*4481.

For the odd indices, consider separately the cases 6n+1, 6n+3 and 6n+5.
First, the period of 2 is 3, and U(6n+3) is divisible by 2 for all n (in
fact, U(3n) is divisible by 2, but at this point we only need the odd
values). For 6n+5, we have

 U(18n+5)  is divisible by   17 (actual period = 9)
 U(18n+11) is divisible by   19
 U(54n+17) is divisible by   53 (actual period = 27)
 U(54n+35) is divisible by  109 (actual period = 27)
 U(54n+53) is divisible by 5779

which covers all the cases U(6n+5). For 6n+1, we have

 U(30n+7)  is divisible by5 (actual period = 5)
 U(30n+13) is divisible by   11 (actual period = 10)
 U(30n+19) is divisible by   61 (actual period = 15)
 U(30n+25) is divisible by   31
 U(60n+1)  is divisible by 2521
 U(60n+31) is divisible by   41 (actual period = 20)

which covers all the cases U(6n+1). This completes the coverage of all
cases, thus U(n) is composite for all n.

Graham's error was that his original U(1) and U(2) gave U(64n+33)
divisible by 1087, rather than U(64n+32). This doesn't mean necessarily
that any U(n) was prime in his original sequence, only that U(n) could
not easily be proved composite.

Regards,

Bill
_
Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Re: Mersenne: questions

1999-12-07 Thread Bill Daly

 why is f(0) in an ll test = 4

The value of f(0) must be such that f(0)-2 is a quadratic residue mod Mp
and f(0)+2 is a quadratic nonresidue mod Mp. 4 is the smallest value
which works for all p; 10 is the next, followed by 52. You could use
f(0) = 3 provided that 5 is a quadratic nonresidue mod Mp, which will be
the case when p = 3 mod 4, but 3 will not work when p = 1 mod 4. There
are some numbers, e.g. 6, which never work. The precise reasoning behind
this is a bit too complicated for a short reply.

Regards,

Bill
_
Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Re: Mersenne: LL and Pollard-rho in one?

1999-11-02 Thread Bill Daly

The behavior of the recursion x[n+1] = x[n]^2 - 2 can be precisely
analyzed. (In fact, it is because of this that the LL-test works at
all.) For a fixed prime p, the periodicity depends on the factorization
of either p-1 (if x[1]^2-4 is a quadratic residue mod p) or p+1 (if
x[1]^2-4 is a quadratic nonresidue mod p). In either case, let the
factorization be t*2^m where t is odd. Without going into excruciating
detail, the length of the non-periodic portion starting with x[1] will
be at most m, and the length of the loop will be a divisor of phi(t)/2
(the Euler totient function). (The behavior of the recursion x[n+1] =
x[n]^2 is similar, except that only the factorization of p-1 is
involved.) For example, when p = 41, the loops are {-1}, {2}, {6,-7},
{14,-11,-4} and {8,-20,-12,19,-10,16}. The value x[1] = 13 has the
non-periodic portion 13, 3, 7 of length 3 before entering the {6,-7}
loop, corresponding to the factorization 41-1 = 5*2^3. The lengths of
the loops {2} and {6,-7} are factors of phi(5)/2 = 2. Similarly, the
lengths of the loops {-1}, {14,-11,-4} and {8,...,16} are factors of
phi(21)/2 = 6.

From the point of view of factorization, these periods are much too long
to be practical. If we use x[n+1] = x[n]^2 + k, where k  0 and k 
-2, then empirically the length of the loops mod p will be much smaller
than those for k = 0 or k = -2, while the length of the non-periodic
portion can easily be greater than m. For example, the recursion x[n+1]
= x[n]^2 - 6 mod 41 has three loops, {-2}, {3}, and {14,26}, and the
initial value x[1] = 0 has a non-periodic part of length 9. This is why
Pollard-rho works best for values of k other than 0 and -2.

An efficient Pollard-rho-like algorithm for factoring a Mersenne number
Mp is based on the iteration x[n+1] = x[n]^p + k. The idea is that
x[n+1] - x[n] = x[n]^p - x[n-1]^p = (x[n] - x[n-1]) * f[n], where f[n] =
(x[n]^p - x[n-1]^p) / (x[n] - x[n-1]) is known to have prime factors
which are all congruent to 1 mod p. Since Mp must also have prime
factors which are congruent to 1 mod p, this increases the chance that
gcd(x[n+1] - x[n], Mp) will produce a factor of Mp. Note that x[n+1] -
x[n] = f[n]*f[n-1]*f[n-2]*...*f[j]*(x[j] - x[j-1]), thus if any of the
terms in f[n]*f[n-1]*...*f[j] has a factor in common with Mp, this will
show up in the gcd. Ordinary Pollard-rho will also (given enough
iterations) find a factor of Mp, but the probability of this happening
within the first p-2 steps is minuscule. On the other hand, it would
cost little, after a failed LL-test, to calculate the gcd of the
difference of the last two results with Mp, just in case this happens to
reveal a factor. For that matter, one could just take the difference
between the final result and the last value stored in the p file.
Maybe if prime95 saved the final result in an r file, this test
could be performed later.

One interesting sidelight to this. The usual test for primality of a
Fermat number is to start with x[1] = 3, then calculate x[n+1] = x[n]^2
mod Fk up to x[2^k] mod Fk, which will be equal to -1 if and only if Fk
is in fact prime. Alternatively, one could start with y[1] = 3+1/3 mod
Fk (i.e., y[1] = (Fk+10)/3), then calculate y[n+1] = y[n]^2 - 2 mod Fk
up to y[2^k-1] mod Fk, which will be equal to 0 if and only if Fk is in
fact prime. This is because y[n] = x[n] + 1/x[n] mod Fk, thus y[2^k] =
(-1) + 1/(-1) = -2 mod Fk, thus y[2^k-1]^2 - 2 = y[2^k] = -2 mod Fk,
which implies y[2^k-1] = 0 mod Fk. If we calculate mod Fk(Fk+2) instead
of mod Fk, then the calculations can be carried out by prime95 with only
trivial modifications, and at the end we will have to reduce y[2^k-1]
mod Fk(Fk+2) to y[2^k-1] mod Fk to complete the test. Of course, there
are no longer any Fk within reach of current computers, so this is
merely of academic interest.

I suggested another (small) group of potential primes for testing in an
earlier message, but it never got through to the list (most likely
because my e-mail address has changed). Let F3(n) = (2^3^(n+1) - 1) /
(2^3^n - 1) = 2^(2*3^n) + 2^3^n + 1. F3(n) grows more quickly than the
Fermat numbers, so there aren't very many within reach of current
computers. There is a primality test for F3(n) which is similar to that
for Fermat numbers. 5 is a quadratic nonresidue of F3(n) for all n, thus
if F3(n) is prime, then 5^((F3(n)-1)/2) = -1 mod F3(n). On the other
hand, if 5^((F3(n)-1)/2) = -1 mod F3(n), and q is a divisor of F3(n),
then the congruence also holds mod q, which implies that q = 1 mod
2^3^n. However, since (2^3^n + 1)^2  F3(n), it is not possible for
F3(n) to have two factors both of which are congruent to 1 mod 2^3^n,
thus if 5^((F3(n)-1)/2) = -1 mod F3(n), F3(n) must be prime. Thus, start
with x[1] = 5 and calculate x[j+1] = x[j]^2 mod 2^3^(n+1)-1 up to
x[3^n+1], then y[1] = 5*x[3^n+1] mod 2^3^(n+1)-1, then y[j+1] = y[j]^2
mod 2^3^(n+1)-1 up to y[3^n]. Reduce y[3^n] mod 2^3^(n+1)-1 to y[3^n]
mod F3(n). If the result is -1, then F3(n) is 

Re: Mersenne: Lehmer question

1999-07-08 Thread Bill Daly

Peter-Lawrence.Montgomery wrote:

Problem A3 in Richard Guy's `Unsolved Problems in Number Theory'
includes this question, by D.H. Lehmer:

Let Mp = 2^p - 1 be a Mersenne prime, where p  2.
Denote S[1] = 4 and  S[k+1] = S[k]^2 - 2 for k = 1.
Then S[p-2] == +- 2^((p+1)/2) mod Mp.
Predict which congruence occurs.

Of course, the reason that S[p-2] == +- 2^((p+1)/2) mod Mp is that we
know that S[p-1] == 0 mod Mp, and S[p-1] = S[p-2]^2 - 2, thus S[p-2] is
a square root of 2 mod Mp. Then since 2^p == 1 mod Mp, 2^(p+1) == 2 mod
Mp, thus +- 2^((p+1)/2) are the square roots of 2 mod Mp, and S[p-2]
must be congruent to one of these mod Mp.

I don't see any easy way of predicting the sign, but the following might
be helpful.

We know that it is possible to use a value other than 4 for S[1] in the
LL sequence, e.g. we can set S[1] = 10 instead. Suppose that b is such
that if we set S[1] = b, then S[p-1] == 0 mod Mp. The necessary and
sufficient condition that b have this property is that b-2 is a
quadratic residue mod Mp and b+2 is a quadratic nonresidue mod Mp.

It turns out that there are exactly 2^(p-2) = (Mp+1)/4 values b which
have this property, and there is a systematic way of generating them.
Let L[0] = 2, L[1] = 4, L[j+2] = 4*L[j+1] - L[j]. (The sequence L[j] is
related to the sequence S[k] with S[1] = 4 by the following identity:
S[k] = L[2^(k-1)].) Let B[j] = L[2*j-1] for j = 1..2^(p-2). Then the
values B[j] mod Mp are all distinct, and each B[j] has the desired
property.

Of this set B[j], exactly half correspond to S[p-2] == +2^((p+1)/2) mod
Mp, and the other half correspond to S[p-2] == -2^((p+1)/2) mod Mp. The
two subsets are the set B1[j] = B[j] for j = 0,1,4,5 mod 8, and the set
B2[j] = B[j] for j = 2,3,6,7 mod 8. 4, which is B[1], belongs to B1. I
do not know which of the sets B1 and B2 corresponds to the + sign for
S[p-2]. Note that B[2] = 52 belongs to B2, thus the sequences beginning
with S[1] = 4 and S[1] = 52 have opposite signs for S[p-2].

The above, with a few modifications, also works for primes of the form
c*2^n - 1, where c  1 is odd and n is not necessarily prime. Suppose
that q = c*2^n - 1 is prime. Let b be such that b-2 is a quadratic
residue mod q and b+2 is a quadratic nonresidue mod q. Let L[0] = 2,
L[1] = b, L[j+2] = b*L[j+1] - L[j]. Let B[j] = L[c*(2*j-1)] for j =
1..2^(n-2). Let S[1] = B[j], S[k+1] = S[k]^2 - 2. Then S[p-1] == 0 mod
q, and S[p-2] is a square root of 2 mod q. Exactly half of the B[j]
correspond to a specific value of S[p-2]. The subsets B1 and B2 can be
defined in the same way as for the case c = 1.

Regards,

Bill
**
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom
they are addressed.
This footnote also confirms that this email message has been swept by
MIMEsweeper for the presence of computer viruses.
**

Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm



Mersenne: Re: Questions on Crandall algorithm

1999-01-07 Thread Bill Daly

 From: "Olivier Langlois" [EMAIL PROTECTED]
 Date: Wed, 6 Jan 1999 22:31:24 -0500
 Subject: Mersenne: Questions on Crandall algorithm
 
 Hi,
 
 I'm trying to understand Crandall's algorithm but I'm not sure to get the
 idea behind it. So let me explain you what I've understand and I would
 appreciate it if you could tell me if I got it right.
 snip

Let's consider a simple example. Suppose that we have two values

  X = x0 + x1*2^4 + x2*2^8 + x3*2^12
  Y = y0 + y1*2^4 + y2*2^8 + y3*2^12

and we multiply them to obtain

  Z = X*Y = z0 + z1*2^4 + z2*2^8 + z3*2^12 + z4*2^16 + z5*2^20 + z6*2^24

Then we have

  z0 = x0*y0
  z1 = x0*y1 + x1*y0
  z2 = x0*y2 + x1*y1 + x2*y0
  z3 = x0*y3 + x1*y2 + x2*y1 + x3*y0
  z4 = x1*y3 + x2*y2 + x3*y1
  z5 = x2*y3 + x3*y2
  z6 = x3*y3

Sums of this form are called convolution sums, and the FFT essentially
provides a very efficient way to calculate the values of convolution
sums.

When we implement the FFT using complex floating-point arithmetic, the
values obtained are only approximations to the correct convolution sums,
because of the inevitable round-off error which builds up during the
calculation. If, however, we know that the coefficients in X and Y are
integers, then we expect the calculated convolution sums to be close to
integers, and if they are sufficiently close, we can simply round the
calculated sums to the nearest integer to recover the exact values. This
will work provided that the maximum error in any calculated convolution
sum is less than 0.5. On the Pentium, for example, floating-point values
are accurate to 53 bits, thus we might expect that in the calculation of
z3, we might accumulate a relative error of at most 3 bits, thus if the
integer part of z3 is less than 2^50, we should be able to round it to
the correct integer. This will be the case if the coefficients in X and
Y are less than 2^24, which is much more than adequate here, since in
fact the coefficients are limited to less than 2^4.

Now, let's modify the original setup:

  X = x0 + x1*2^5 + x2*2^9 + x3*2^13
  Y = y0 + y1*2^5 + y2*2^9 + y3*2^13

and we multiply them to obtain

  Z = X*Y = z0 + z1*2^5 + z2*2^9 + z3*2^13 + z4*2^17 + z5*2^22 + z6*2^26

Then we have

  z0 = x0*y0
  z1 = x0*y1 + x1*y0
  z2 = x0*y2 + 2*x1*y1 + x2*y0
  z3 = x0*y3 + 2*x1*y2 + 2*x2*y1 + x3*y0
  z4 = 2*x1*y3 + 2*x2*y2 + 2*x3*y1
  z5 = x2*y3 + x3*y2
  z6 = x3*y3

These values are very similar to convolution sums, except that certain
products have an extra factor of 2. Note that whereas in the original
setup, the exponents in X and Y were in arithmetic progression, in the
modified setup, they are nearly in arithmetic progression. We can put
them into arithmetic progression by replacing the powers of 2 in X, Y
and Z as follows:

  2^5 -  (2^(3/4)) * (2^(17/4))
  2^9 -  (2^(2/4)) * (2^(34/4))
  2^13 - (2^(1/4)) * (2^(51/4))
  2^17 - 1 * (2^(68/4))
  2^22 - (2^(3/4)) * (2^(85/4))
  2^26 - (2^(2/4)) * (2^(102/4))

so that for each term, e.g. x1*2^5, we write instead x1*2^5 =
x1*(2^(3/4))*(2^(17/4)) = x1'*2^(17/4), where x1' = x1*2^(3/4).
Multipliers like 2^(3/4) here are Crandall's two-to-the-phi values. We
then have

  X = x0' + x1'*2^(17/4) + x2'*2^(34/4) + x3'*2^(51/4)
  Y = y0' + y1'*2^(17/4) + y2'*2^(34/4) + y3'*2^(51/4)

and we multiply them to obtain

  Z = X*Y = z0' + z1'*2^(17/4) + z2'*2^(34/4) + z3'*2^(51/4) +
z4'*2^(68/4) + z5'*2^(85/4) + z6'*2^(102/4)

Then we have

  z0' = x0'*y0'
  z1' = x0'*y1' + x1'*y0'
  z2' = x0'*y2' + x1'*y1' + x2'*y0'
  z3' = x0'*y3' + x1'*y2' + x2'*y1' + x3'*y0'
  z4' = x1'*y3' + x2'*y2' + x3'*y1'
  z5' = x2'*y3' + x3'*y2'
  z6' = x3'*y3'

Since these are true convolution sums, we can use the FFT to compute
them. When we write them out with the original coefficients and the
two-to-the-phi multipliers, we get

  z0 = z0' = x0'*y0' = x0*y0
  z1*2^(3/4) = z1' = x0'*y1' + x1'*y0' = x0*y1*2^(3/4) + x1*y0*2^(3/4)
  z2*2^(2/4) = z2' = x0'*y2' + x1'*y1' + x2'*y0' = x0*y2*2^(2/4) +
x1*y1*2^(6/4) + x2*y0*2^(2/4)
  z3*2^(1/3) = z3' = x0'*y3' + x1'*y2' + x2'*y1' + x3'*y0' =
x0*y3*2^(1/4) + x1*y2*2^(5/4) + x2*y1*2^(5/4) + x3*y0*2^(1/4)
  z4 = z4' = x1'*y3' + x2'*y2' + x3'*y1' = x1*y3*2^(4/4) + x2*y2*2^(4/4)
+ x3*y1*2^(4/4)
  z5*2^(3/4) = z5' = x2'*y3' + x3'*y2' = x2*y3*2^(3/4) + x3*y2*2^(3/4)
  z6*2^(2/4) = z6' = x3'*y3' = x3*y3*2^(2/4)

thus, when we back out the two-to-the-phi multipliers, e.g. by
multiplying z1' by 2^(-3/4), we get

  z0 = x0*y0
  z1 = x0*y1 + x1*y0
  z2 = x0*y2 + 2*x1*y1 + x2*y0
  z3 = x0*y3 + 2*x1*y2 + 2*x2*y1 + x3*y0
  z4 = 2*x1*y3 + 2*x2*y2 + 2*x3*y1
  z5 = x2*y3 + x3*y2
  z6 = x3*y3

which are precisely the coefficients we are looking for. Note that when
the X and Y coefficients are integers, so are the Z coefficients, thus
again we can round off to the nearest integer to obtain the exact
result.

The upshot is that we can use the FFT not only to compute true
convolution sums, but also the near-convolution sums which arise when
the exponents in X and Y are nearly in 

Mersenne: Integer FFT

1998-12-17 Thread Bill Daly

The usual concept of an integer FFT is to choose a prime q such that q-1
is divisible by a reasonably large power of 2, say N = 2^n, find a
primitive (integer) N-th root of 1 mod q, say w, then use w and
arithmetic mod q to calculate the FFT. If it also happens that there is
an integer N-th root of 2 mod q, then the FFT can be converted into a
DWT suitable for implementing multiplication mod 2^p - 1 for some large
prime p. For example, with q = 2^64 - 2^32 + 1, we have both roots for N
up to 2^26, and for that matter we can also use an FFT/DWT of order N =
5*2^n up to 5*2^26. However, this requires calculating products of the
form a*b mod q where a and b are 64-bit integers, which, while
straightforward, will probably take more time than the corresponding
complex floating-point multiply in the ordinary FFT/DWT. It is also hard
to find appropriate primes q for which this works.

Richard Crandall has proposed implementing an FFT in the Galois field
GF(q^2), where q is itself a Mersenne prime, e.g. q = 2^61 - 1. He calls
this the fast Galois transform (FGT). The idea is that for any prime q =
3 mod 4, the field of Gaussian integers a + b*i mod q is isomorphic to
GF(q^2), thus we can simply replace complex real values with complex
integers mod q. The multiplicative order of GF(q^2) is q^2-1, thus for q
= 2^61 - 1, q^2 - 1 will be divisible by 2^62, thus we can find a
complex integer w such that w^2^62 = 1. Also, since the order of 2 in
GF(q^2) is 61, there will also be a value r such that r^2^62 = 2. We can
use r and w to implement a complex integer DWT mod q, which requires
code to add, subtract and multiply mod q. However, we still have the
problem that the calculation of a*b mod q is likely to take more time
than we would really like.

I have a suggestion. The FGT requires only a prime q = 3 mod 4, for
which q+1 is divisible by a reasonably large power of 2. Let's consider
primes q = k*2^n - 1 which are slightly less than 2^32. The idea here is
that this will require only 32-bit integer arithmetic. If we use two
such primes, q1 and q2, and we calculate separately the two sets of
convolution sums z1[0..N-1] mod q1 and z2[0..N-1] mod q2, then we can
use the Chinese Remainder Theorem to calculate the convolution sums
z[0..N-1] mod q1*q2. This is less difficult than it might seem, since we
can precalculate values u1 and u2 such that u1 = 1 mod q1, u1 = 0 mod
q2, u2 = 0 mod q1, u2 = 1 mod q2. Then if we have, from the two FGT
calculations, values z1[j] and z2[j] such that z[j] = z1[j] mod q1 and
z[j] = z2[j] mod q2, we have z[j] = u1*z1[j] + u2*z2[j] mod q1*q2, which
is easily verified to be correct. This calculation can be further
simplified because of the fact that u1 + u2 = 1 mod q1*q2 (in fact, u1 +
u2 = q1*q2 + 1). Thus, (u1+u2)*(z1[j]+z2[j]) = z1[j]+z2[j] mod q1*q2,
and z[j] = u1*z1[j] + u2*z2[j] = (z1[j] + z2[j] + (u1-u2)*(z1[j]-z2[j]))
/ 2 mod q1*q2. This requires only a single multiply
(u1-u2)*(z1[j]-z2[j]) mod q1*q2, together with some shifts and adds mod
q1*q2.

The rationale for this is that the efficacy of an integer FFT depends on
the size of the modulus. If the modulus is slightly less than 2^32, then
for N in the range used by GIMPS we may be able to use only 6-bit
coefficients, whereas with a modulus slightly less than 2^64, in this
case q1*q2, we will be able to use 22-bit coefficients, thereby
increasing the range of validity by a factor of nearly 4, in exchange
for using two FGTs instead of one and the Chinese Remainder step at the
end of each iteration.

It is worth noting also, in the particular case of the Pentium, that
there is an efficient method of calculating a*b mod q, for q  2^32,
using the floating-point unit. This is nice because FMUL is faster than
MUL on the Pentium, and because we can interleave integer adds and
subtracts mod q with floating-point multiplies mod q.

Here is a concrete example for N = 3*2^18:

  N-th root of 1  N-th root of 2
  q1 4293525503  2908044543+2957159114*i960683048
  q2 4292083711  4225761219+3412039801*i   1768092337

With these values, we can construct an FGT for both q1 and q2. To obtain
the convolution sum z[j] from z1[j] and z2[j], we have

  u1 = 11727005047594504564
  u2 =  6701165826594877070
  q1*q2 = 18428170874189381633
  u1-u2 =  5025839220999627494

We then multiply (u1-u2)*(z1[j]-z2[j]), which is the product of a signed
63-bit number by a signed 32-bit number. It is best to save the signs of
u1-u2 and z1[j]-z2[j] and use their absolute values, since a 64x32
unsigned multiply can be implemented with two 32x32 unsigned multiplies
to produce a 96-bit unsigned result. Reducing this mod q1*q2 is a bit
messy but doable; essentially, we want to use the FPU to calculate an
estimate of floor(product/q1*q2), then reduce product to product -
floor(product/q1*q2)*(q1*q2), then if necessary iterate until the result
is reduced mod q1*q2. Then, depending on the saved signs, we either add
or subtract this product from