[mpir-devel] Re: Need some review with these files

Bill Hart Tue, 23 Dec 2008 17:50:48 -0800

OK great. Can you commit to trunk after running that.

It should fix the --enable-cxx issue you reported. We've already
tested this fix previously and it should work. Just no one committed
it.


The automess I reported is probably due to a different version of
autohell on sage.math (the machine we had decided to use as default
for running auto stuff). We'll have to find another default machine to
run this stuff on. Yours seems to work for now Jason. But maybe we can
find a machine which isn't dependent on just one of the developers.

Bill.

2008/12/24  <ja...@njkfrudils.plus.com>:
>
> On Wednesday 24 December 2008 01:22:01 Bill Hart wrote:
>> I've got a fix for the --tag issue, but I can't commit it because
>> aclocal, autoconf, automake return a whole pile of errors. Seems like
>> the whole build system has gotten screwed up.
>
> runs fine on my system
>
> make distclean
>
>
> r...@box1:~/mpir/mpir/mpir/trunk# aclocal
> /usr/share/aclocal/nspr.m4:8: warning: underquoted definition of AM_PATH_NSPR
>  run info '(automake)Extending aclocal'
>  or see http://sources.redhat.com/automake/automake.html#Extending-aclocal
> /usr/share/aclocal/imlib.m4:9: warning: underquoted definition of
> AM_PATH_IMLIB
> /usr/share/aclocal/imlib.m4:167: warning: underquoted definition of
> AM_PATH_GDK_IMLIB
> /usr/share/aclocal/audiofile.m4:12: warning: underquoted definition of
> AM_PATH_AUDIOFILE
> /usr/share/aclocal/aalib.m4:12: warning: underquoted definition of
> AM_PATH_AALIB
> r...@box1:~/mpir/mpir/mpir/trunk# autoconf
> r...@box1:~/mpir/mpir/mpir/trunk# automake
> r...@box1:~/mpir/mpir/mpir/trunk# autoheader
> r...@box1:~/mpir/mpir/mpir/trunk#
>
> r...@box1:~/mpir/mpir/mpir/trunk# aclocal --version
> aclocal (GNU automake) 1.9.6
> Written by Tom Tromey <tro...@redhat.com>
>
> o...@box1:~/mpir/mpir/mpir/trunk# autoconf --version
> autoconf (GNU Autoconf) 2.61
> Copyright (C) 2006 Free Software Foundation, Inc.
> This is free software.  You may redistribute copies of it under the terms of
> the GNU General Public License <http://www.gnu.org/licenses/gpl.html>.
> There is NO WARRANTY, to the extent permitted by law.
>
> Written by David J. MacKenzie and Akim Demaille.
>
>
>
> r...@box1:~/mpir/mpir/mpir/trunk# automake --version
> automake (GNU automake) 1.9.6
> Written by Tom Tromey <tro...@redhat.com>.
>
> Copyright 2005 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions.  There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR
>
>
>
> o...@box1:~/mpir/mpir/mpir/trunk# autoheader --version
> autoheader (GNU Autoconf) 2.61
> Copyright (C) 2006 Free Software Foundation, Inc.
> This is free software.  You may redistribute copies of it under the terms of
> the GNU General Public License <http://www.gnu.org/licenses/gpl.html>.
> There is NO WARRANTY, to the extent permitted by law.
>
> Written by Roland McGrath and Akim Demaille.
>
>
>
>>
>> I'll commit the fix for the --tag issue, but it won't take effect
>> until someone can successfully run aclocal, autoconf, automake and
>> commit the diff.
>>
>> Bill.
>>
>> 2008/12/24  <ja...@njkfrudils.plus.com>:
>> > On Wednesday 24 December 2008 01:15:08 Bill Hart wrote:
>> >> Are you sure it is not in there? It is in the version I checked out.
>> >
>> > Yeah , sorry it is there , just not right permissions so on a color term
>> > I miss it
>> >
>> >> Bill.
>> >>
>> >> 2008/12/24  <ja...@njkfrudils.plus.com>:
>> >> > On Wednesday 24 December 2008 00:51:20 ja...@njkfrudils.plus.com wrote:
>> >> >> On Wednesday 24 December 2008 00:31:50 Bill Hart wrote:
>> >> >> > On sage.math:
>> >> >> >
>> >> >> > cd tune
>> >> >> > make tune
>> >> >> >
>> >> >> > ./.libs/libspeed.a(gcd_bin.o): In function
>> >> >> > `__gmpn_ngcd_matrix_init': gcd_bin.c:(.text+0x0): multiple
>> >> >> > definition of
>> >> >> > `__gmpn_ngcd_matrix_init' gcd.o:gcd.c:(.text+0x0): first defined
>> >> >> > here ./.libs/libspeed.a(gcd_bin.o): In function
>> >> >> > `__gmpn_nhgcd_itch': gcd_bin.c:(.text+0x80): multiple definition of
>> >> >> > `__gmpn_nhgcd_itch' gcd.o:gcd.c:(.text+0x80): first defined here
>> >> >> > ./.libs/libspeed.a(gcd_bin.o): In function `__gmpn_nhgcd':
>> >> >> > gcd_bin.c:(.text+0xc4): multiple definition of `__gmpn_nhgcd'
>> >> >> > gcd.o:gcd.c:(.text+0xc4): first defined here
>> >> >> > ./.libs/libspeed.a(gcd_bin.o): In function `mpn_basic_gcd':
>> >> >> > gcd_bin.c:(.text+0x2ed): multiple definition of `mpn_basic_gcd'
>> >> >> > gcd.o:gcd.c:(.text+0x2ed): first defined here
>> >> >>
>> >> >> On my K8-linux same problem with make speed and tune
>> >> >>
>> >> >> ./configure && make && make check passes OK , and I got bored running
>> >> >> ./try mpn_gcd :)
>> >> >>
>> >> >> get this warning from the build though
>> >> >>
>> >> >> ngcd.c: In function 'mpn_ngcd':
>> >> >> ngcd.c:75: warning: implicit declaration of function 'mpn_basic_gcd'
>> >> >>
>> >> >>
>> >> >> gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I.. -D__GMP_WITHIN_GMP -I.. -O2
>> >> >> -m64 -march=k8 -mtune=k8 -c gcd.c  -fPIC -DPIC -o .libs/gcd.o
>> >> >> gcd.c: In function 'mpz_rgcd':
>> >> >> gcd.c:167: warning: implicit declaration of function 'mpn_rgcd'
>> >> >> gcd.c: In function 'mpz_bgcd':
>> >> >> gcd.c:171: warning: implicit declaration of function 'mpn_bgcd'
>> >> >> gcd.c: In function 'mpz_sgcd':
>> >> >> gcd.c:175: warning: implicit declaration of function 'mpn_sgcd'
>> >> >> gcd.c: In function 'mpz_ngcd':
>> >> >> gcd.c:179: warning: implicit declaration of function 'mpn_ngcd'
>> >> >>
>> >> >>
>> >> >> ./configure  --enable-alloca=debug --enable-assert && make && make
>> >> >> check passes OK
>> >> >>
>> >> >> but
>> >> >>
>> >> >> ./configure  --enable-alloca=debug --enable-assert --enable-nails=2
>> >> >> && make && make check
>> >> >>
>> >> >> PASS: t-mul_i
>> >> >> PASS: t-tdiv
>> >> >> PASS: t-tdiv_ui
>> >> >> PASS: t-fdiv
>> >> >> PASS: t-fdiv_ui
>> >> >> PASS: t-cdiv_ui
>> >> >> nhgcd2.c:206: GNU MP assertion failed: h0 == h1
>> >> >> /bin/sh: line 4: 31599 Aborted                 ${dir}$tst
>> >> >> FAIL: t-gcd
>> >> >> PASS: t-gcd_ui
>> >> >> nhgcd2.c:206: GNU MP assertion failed: h0 == h1
>> >> >> /bin/sh: line 4: 31647 Aborted                 ${dir}$tst
>> >> >> FAIL: t-lcm
>> >> >> PASS: dive
>> >> >> PASS: dive_ui
>> >> >> PASS: t-sqrtrem
>> >> >> PASS: convert
>> >> >> PASS: io
>> >> >
>> >> > ./configure  --enable-cxx --build=none && make && make check passes
>> >> > but
>> >> > ./configure  --enable-cxx && make && make check   Fails to build
>> >> >
>> >> > this again on K8-linux
>> >> >
>> >> > make[4]: Leaving directory `/root/mpir/mpir/mpir/trunk/yasm'
>> >> > make[3]: Leaving directory `/root/mpir/mpir/mpir/trunk/yasm'
>> >> > make[2]: Leaving directory `/root/mpir/mpir/mpir/trunk/yasmbuild'
>> >> > Making all in mpn
>> >> > make[2]: Entering directory `/root/mpir/mpir/mpir/trunk/mpn'
>> >> > /bin/sh ../libtool --tag=CC   --mode=compile
>> >> > gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I.. -D__GMP_WITHIN_GMP -I..
>> >> > -DOPERATION_`echo fib_table | sed 's/_$//'`    -O2 -m64 -march=k8
>> >> > -mtune=k8 -c -o fib_table.lo fib_table.c
>> >> > mkdir .libs
>> >> >
>> >> > gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I.. -D__GMP_WITHIN_GMP -I..
>> >> > -DOPERATION_fib_table -O2 -m64 -march=k8 -mtune=k8 -c fib_table.c
>> >> > -fPIC -DPIC -o .libs/fib_table.o
>> >> >
>> >> > gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I.. -D__GMP_WITHIN_GMP -I..
>> >> > -DOPERATION_fib_table -O2 -m64 -march=k8 -mtune=k8 -c fib_table.c -o
>> >> > fib_table.o >/dev/null 2>&1
>> >> > /bin/sh ../libtool --tag=CC   --mode=compile
>> >> > gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I.. -D__GMP_WITHIN_GMP -I..
>> >> > -DOPERATION_`echo mp_bases | sed 's/_$//'`    -O2 -m64 -march=k8
>> >> > -mtune=k8 -c -o mp_bases.lo mp_bases.c
>> >> >
>> >> > gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I.. -D__GMP_WITHIN_GMP -I..
>> >> > -DOPERATION_mp_bases -O2 -m64 -march=k8 -mtune=k8 -c mp_bases.c  -fPIC
>> >> > -DPIC -o .libs/mp_bases.o
>> >> >
>> >> > gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I.. -D__GMP_WITHIN_GMP -I..
>> >> > -DOPERATION_mp_bases -O2 -m64 -march=k8 -mtune=k8 -c mp_bases.c -o
>> >> > mp_bases.o >/dev/null 2>&1
>> >> > /bin/sh ../libtool --tag=CC   --mode=compile
>> >> > gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I.. -D__GMP_WITHIN_GMP -I..
>> >> > -DOPERATION_`echo add | sed 's/_$//'`    -O2 -m64 -march=k8 -mtune=k8
>> >> > -c -o add.lo add.c
>> >> >
>> >> > gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I.. -D__GMP_WITHIN_GMP -I..
>> >> > -DOPERATION_add -O2 -m64 -march=k8 -mtune=k8 -c add.c  -fPIC -DPIC -o
>> >> > .libs/add.o
>> >> >
>> >> > gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I.. -D__GMP_WITHIN_GMP -I..
>> >> > -DOPERATION_add -O2 -m64 -march=k8 -mtune=k8 -c add.c -o add.o
>> >> > >/dev/null 2>&1
>> >> > /bin/sh ../libtool --tag=CC   --mode=compile
>> >> > gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I.. -D__GMP_WITHIN_GMP -I..
>> >> > -DOPERATION_`echo add_1 | sed 's/_$//'`    -O2 -m64 -march=k8
>> >> > -mtune=k8 -c -o add_1.lo add_1.c
>> >> >
>> >> > gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I.. -D__GMP_WITHIN_GMP -I..
>> >> > -DOPERATION_add_1 -O2 -m64 -march=k8 -mtune=k8 -c add_1.c  -fPIC -DPIC
>> >> > -o .libs/add_1.o
>> >> >
>> >> > gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I.. -D__GMP_WITHIN_GMP -I..
>> >> > -DOPERATION_add_1 -O2 -m64 -march=k8 -mtune=k8 -c add_1.c -o add_1.o
>> >> >
>> >> > >/dev/null 2>&1
>> >> >
>> >> > /bin/sh ../libtool --mode=compile ../strip_fPIC.sh ../yasm/yasm -f
>> >> > elf64 -o add_n.lo `test -f 'add_n.as' || echo './'`add_n.as
>> >> > libtool: compile: unable to infer tagged configuration
>> >> > libtool: compile: specify a tag with `--tag'
>> >> > make[2]: *** [add_n.lo] Error 1
>> >> > make[2]: Leaving directory `/root/mpir/mpir/mpir/trunk/mpn'
>> >> > make[1]: *** [all-recursive] Error 1
>> >> > make[1]: Leaving directory `/root/mpir/mpir/mpir/trunk'
>> >> > make: *** [all] Error 2
>> >> >
>> >> > config.guess from gmp is athlon64-unknown-linux-gnu
>> >> > note config.guess not in mpir , is it supposed to be ?
>> >> >
>> >> >> > Bill.
>> >> >> >
>> >> >> > 2008/12/23 Jason Martin <jason.worth.mar...@gmail.com>:
>> >> >> > > Attached are some edited versions of
>> >> >> > >
>> >> >> > > mpn/generic/gcd.c
>> >> >> > >
>> >> >> > > and
>> >> >> > >
>> >> >> > > mpn/generic/ngcd.c
>> >> >> > >
>> >> >> > > Drop them in, test them for correctness and speed.  Let me know
>> >> >> > > what breaks.  When everyone is happy, I'll check them in to svn
>> >> >> > >
>> >> >> > > --jason
>> >> >> > >
>> >> >> > > Jason Worth Martin
>> >> >> > > Asst. Professor of Mathematics
>> >> >> > > http://www.math.jmu.edu/~martin
>> >> >> > >
>> >> >> > >
>> >> >> > >
>> >> >> > > /* Schönhage's 1987 algorithm, reorganized into hgcd form */
>> >> >> > >
>> >> >> > > #include <stdio.h>  /* for NULL */
>> >> >> > >
>> >> >> > > #include "gmp.h"
>> >> >> > > #include "gmp-impl.h"
>> >> >> > > #include "longlong.h"
>> >> >> > >
>> >> >> > >
>> >> >> > > /*
>> >> >> > > *****************************************************************
>> >> >> > >* * Here we are including the original GMP version of mpn_gcd *
>> >> >> > > but we rename it as mpn_basic_gcd.  It needs to be available *
>> >> >> > > for the ngcd algorithm to call in the base case.
>> >> >> > >  *
>> >> >> > >  *  Preconditions [U = (up, usize) and V = (vp, vsize)]:
>> >> >> > >  *
>> >> >> > >  *   1.  V is odd.
>> >> >> > >  *   2.  numbits(U) >= numbits(V).
>> >> >> > >  *
>> >> >> > >  *   Both U and V are destroyed by the operation.  The result is
>> >> >> > > left at vp, *   and its size is returned.
>> >> >> > >  *
>> >> >> > >  *   Ken Weber (kwe...@mat.ufrgs.br, kwe...@mcs.kent.edu)
>> >> >> > >  *
>> >> >> > >  *   Funding for this work has been partially provided by
>> >> >> > > Conselho *   Nacional de Desenvolvimento Cienti'fico e
>> >> >> > > Tecnolo'gico (CNPq) do *   Brazil, Grant 301314194-2, and was
>> >> >> > > done while I was a visiting *   reseacher in the Instituto de
>> >> >> > > Matema'tica at Universidade Federal *   do Rio Grande do Sul
>> >> >> > > (UFRGS).
>> >> >> > >  *
>> >> >> > >  *   Refer to K. Weber, The accelerated integer GCD algorithm,
>> >> >> > > ACM *      Transactions on Mathematical Software, v. 21 (March),
>> >> >> > > 1995, *      pp. 111-122.
>> >> >> > >  *
>> >> >> > >  *
>> >> >> > > *****************************************************************
>> >> >> > >/
>> >> >> > >
>> >> >> > >
>> >> >> > >
>> >> >> > > /* If MIN (usize, vsize) >= GCD_ACCEL_THRESHOLD, then the
>> >> >> > > accelerated algorithm is used, otherwise the binary algorithm is
>> >> >> > > used.  This may be adjusted for different architectures.  */
>> >> >> > > #ifndef GCD_ACCEL_THRESHOLD
>> >> >> > > #define GCD_ACCEL_THRESHOLD 5
>> >> >> > > #endif
>> >> >> > >
>> >> >> > > /* When U and V differ in size by more than BMOD_THRESHOLD, the
>> >> >> > > accelerated algorithm reduces using the bmod operation.
>> >> >> > > Otherwise, the k-ary reduction is used.  0 <= BMOD_THRESHOLD <
>> >> >> > > GMP_NUMB_BITS. */ enum
>> >> >> > >  {
>> >> >> > >    BMOD_THRESHOLD = GMP_NUMB_BITS/2
>> >> >> > >  };
>> >> >> > >
>> >> >> > >
>> >> >> > > /* Use binary algorithm to compute V <-- GCD (V, U) for usize,
>> >> >> > > vsize == 2. Both U and V must be odd.  */
>> >> >> > > static inline mp_size_t
>> >> >> > > gcd_2 (mp_ptr vp, mp_srcptr up)
>> >> >> > > {
>> >> >> > >  mp_limb_t u0, u1, v0, v1;
>> >> >> > >  mp_size_t vsize;
>> >> >> > >
>> >> >> > >  u0 = up[0];
>> >> >> > >  u1 = up[1];
>> >> >> > >  v0 = vp[0];
>> >> >> > >  v1 = vp[1];
>> >> >> > >
>> >> >> > >  while (u1 != v1 && u0 != v0)
>> >> >> > >    {
>> >> >> > >      unsigned long int r;
>> >> >> > >      if (u1 > v1)
>> >> >> > >        {
>> >> >> > >          u1 -= v1 + (u0 < v0);
>> >> >> > >          u0 = (u0 - v0) & GMP_NUMB_MASK;
>> >> >> > >          count_trailing_zeros (r, u0);
>> >> >> > >          u0 = ((u1 << (GMP_NUMB_BITS - r)) & GMP_NUMB_MASK) | (u0
>> >> >> > > >> r); u1 >>= r;
>> >> >> > >        }
>> >> >> > >      else  /* u1 < v1.  */
>> >> >> > >        {
>> >> >> > >          v1 -= u1 + (v0 < u0);
>> >> >> > >          v0 = (v0 - u0) & GMP_NUMB_MASK;
>> >> >> > >          count_trailing_zeros (r, v0);
>> >> >> > >          v0 = ((v1 << (GMP_NUMB_BITS - r)) & GMP_NUMB_MASK) | (v0
>> >> >> > > >> r); v1 >>= r;
>> >> >> > >        }
>> >> >> > >    }
>> >> >> > >
>> >> >> > >  vp[0] = v0, vp[1] = v1, vsize = 1 + (v1 != 0);
>> >> >> > >
>> >> >> > >  /* If U == V == GCD, done.  Otherwise, compute GCD (V, |U - V|).
>> >> >> > > */ if (u1 == v1 && u0 == v0)
>> >> >> > >    return vsize;
>> >> >> > >
>> >> >> > >  v0 = (u0 == v0) ? (u1 > v1) ? u1-v1 : v1-u1 : (u0 > v0) ? u0-v0
>> >> >> > > : v0-u0; vp[0] = mpn_gcd_1 (vp, vsize, v0);
>> >> >> > >
>> >> >> > >  return 1;
>> >> >> > > }
>> >> >> > >
>> >> >> > > /* The function find_a finds 0 < N < 2^GMP_NUMB_BITS such that
>> >> >> > > there exists 0 < |D| < 2^GMP_NUMB_BITS, and N == D * C mod
>> >> >> > > 2^(2*GMP_NUMB_BITS). In the reference article, D was computed
>> >> >> > > along with N, but it is better to compute D separately as D <-- N
>> >> >> > > / C mod 2^(GMP_NUMB_BITS + 1), treating the result as a twos'
>> >> >> > > complement signed integer.
>> >> >> > >
>> >> >> > >   Initialize N1 to C mod 2^(2*GMP_NUMB_BITS).  According to the
>> >> >> > > reference article, N2 should be initialized to
>> >> >> > > 2^(2*GMP_NUMB_BITS), but we use 2^(2*GMP_NUMB_BITS) - N1 to start
>> >> >> > > the calculations within double precision.  If N2 > N1 initially,
>> >> >> > > the first iteration of the while loop will swap them.  In all
>> >> >> > > other situations, N1 >= N2 is maintained.  */
>> >> >> > >
>> >> >> > > #if HAVE_NATIVE_mpn_gcd_finda
>> >> >> > > #define find_a(cp)  mpn_gcd_finda (cp)
>> >> >> > >
>> >> >> > > #else
>> >> >> > > static
>> >> >> > > #if ! defined (__i386__)
>> >> >> > > inline                          /* don't inline this for the x86
>> >> >> > > */ #endif
>> >> >> > > mp_limb_t
>> >> >> > > find_a (mp_srcptr cp)
>> >> >> > > {
>> >> >> > >  unsigned long int leading_zero_bits = 0;
>> >> >> > >
>> >> >> > >  mp_limb_t n1_l = cp[0];       /* N1 == n1_h * 2^GMP_NUMB_BITS +
>> >> >> > > n1_l. */ mp_limb_t n1_h = cp[1];
>> >> >> > >
>> >> >> > >  mp_limb_t n2_l = (-n1_l & GMP_NUMB_MASK);     /* N2 == n2_h *
>> >> >> > > 2^GMP_NUMB_BITS + n2_l.  */ mp_limb_t n2_h = (~n1_h &
>> >> >> > > GMP_NUMB_MASK);
>> >> >> > >
>> >> >> > >  /* Main loop.  */
>> >> >> > >  while (n2_h != 0)             /* While N2 >= 2^GMP_NUMB_BITS.
>> >> >> > > */ {
>> >> >> > >      /* N1 <-- N1 % N2.  */
>> >> >> > >      if (((GMP_NUMB_HIGHBIT >> leading_zero_bits) & n2_h) == 0)
>> >> >> > >        {
>> >> >> > >          unsigned long int i;
>> >> >> > >          count_leading_zeros (i, n2_h);
>> >> >> > >          i -= GMP_NAIL_BITS;
>> >> >> > >          i -= leading_zero_bits;
>> >> >> > >          leading_zero_bits += i;
>> >> >> > >          n2_h = ((n2_h << i) & GMP_NUMB_MASK) | (n2_l >>
>> >> >> > > (GMP_NUMB_BITS - i)); n2_l = (n2_l << i) & GMP_NUMB_MASK;
>> >> >> > >          do
>> >> >> > >            {
>> >> >> > >              if (n1_h > n2_h || (n1_h == n2_h && n1_l >= n2_l))
>> >> >> > >                {
>> >> >> > >                  n1_h -= n2_h + (n1_l < n2_l);
>> >> >> > >                  n1_l = (n1_l - n2_l) & GMP_NUMB_MASK;
>> >> >> > >                }
>> >> >> > >              n2_l = (n2_l >> 1) | ((n2_h << (GMP_NUMB_BITS - 1))
>> >> >> > > & GMP_NUMB_MASK); n2_h >>= 1;
>> >> >> > >              i -= 1;
>> >> >> > >            }
>> >> >> > >          while (i != 0);
>> >> >> > >        }
>> >> >> > >      if (n1_h > n2_h || (n1_h == n2_h && n1_l >= n2_l))
>> >> >> > >        {
>> >> >> > >          n1_h -= n2_h + (n1_l < n2_l);
>> >> >> > >          n1_l = (n1_l - n2_l) & GMP_NUMB_MASK;
>> >> >> > >        }
>> >> >> > >
>> >> >> > >      MP_LIMB_T_SWAP (n1_h, n2_h);
>> >> >> > >      MP_LIMB_T_SWAP (n1_l, n2_l);
>> >> >> > >    }
>> >> >> > >
>> >> >> > >  return n2_l;
>> >> >> > > }
>> >> >> > > #endif
>> >> >> > >
>> >> >> > >
>> >> >> > > mp_size_t
>> >> >> > > mpn_basic_gcd (mp_ptr gp, mp_ptr up, mp_size_t usize, mp_ptr vp,
>> >> >> > > mp_size_t vsize) {
>> >> >> > >  mp_ptr orig_vp = vp;
>> >> >> > >  mp_size_t orig_vsize = vsize;
>> >> >> > >  int binary_gcd_ctr;           /* Number of times binary gcd will
>> >> >> > > execute.  */ mp_size_t scratch;
>> >> >> > >  mp_ptr tp;
>> >> >> > >  TMP_DECL;
>> >> >> > >
>> >> >> > >  ASSERT (usize >= 1);
>> >> >> > >  ASSERT (vsize >= 1);
>> >> >> > >  ASSERT (usize >= vsize);
>> >> >> > >  ASSERT (vp[0] & 1);
>> >> >> > >  ASSERT (up[usize - 1] != 0);
>> >> >> > >  ASSERT (vp[vsize - 1] != 0);
>> >> >> > > #if WANT_ASSERT
>> >> >> > >  if (usize == vsize)
>> >> >> > >    {
>> >> >> > >      int  uzeros, vzeros;
>> >> >> > >      count_leading_zeros (uzeros, up[usize - 1]);
>> >> >> > >      count_leading_zeros (vzeros, vp[vsize - 1]);
>> >> >> > >      ASSERT (uzeros <= vzeros);
>> >> >> > >    }
>> >> >> > > #endif
>> >> >> > >  ASSERT (! MPN_OVERLAP_P (up, usize, vp, vsize));
>> >> >> > >  ASSERT (MPN_SAME_OR_SEPARATE2_P (gp, vsize, up, usize));
>> >> >> > >  ASSERT (MPN_SAME_OR_SEPARATE2_P (gp, vsize, vp, vsize));
>> >> >> > >
>> >> >> > >  TMP_MARK;
>> >> >> > >
>> >> >> > >  /* Use accelerated algorithm if vsize is over
>> >> >> > > GCD_ACCEL_THRESHOLD. Two EXTRA limbs for U and V are required for
>> >> >> > > kary reduction.  */ if (vsize >= GCD_ACCEL_THRESHOLD)
>> >> >> > >    {
>> >> >> > >      unsigned long int vbitsize, d;
>> >> >> > >      mp_ptr orig_up = up;
>> >> >> > >      mp_size_t orig_usize = usize;
>> >> >> > >      mp_ptr anchor_up = (mp_ptr) TMP_ALLOC ((usize + 2) *
>> >> >> > > BYTES_PER_MP_LIMB);
>> >> >> > >
>> >> >> > >      MPN_COPY (anchor_up, orig_up, usize);
>> >> >> > >      up = anchor_up;
>> >> >> > >
>> >> >> > >      count_leading_zeros (d, up[usize - 1]);
>> >> >> > >      d -= GMP_NAIL_BITS;
>> >> >> > >      d = usize * GMP_NUMB_BITS - d;
>> >> >> > >      count_leading_zeros (vbitsize, vp[vsize - 1]);
>> >> >> > >      vbitsize -= GMP_NAIL_BITS;
>> >> >> > >      vbitsize = vsize * GMP_NUMB_BITS - vbitsize;
>> >> >> > >      ASSERT (d >= vbitsize);
>> >> >> > >      d = d - vbitsize + 1;
>> >> >> > >
>> >> >> > >      /* Use bmod reduction to quickly discover whether V divides
>> >> >> > > U. */ up[usize++] = 0;                          /* Insert leading
>> >> >> > > zero. */ mpn_bdivmod (up, up, usize, vp, vsize, d);
>> >> >> > >
>> >> >> > >      /* Now skip U/V mod 2^d and any low zero limbs.  */
>> >> >> > >      d /= GMP_NUMB_BITS, up += d, usize -= d;
>> >> >> > >      while (usize != 0 && up[0] == 0)
>> >> >> > >        up++, usize--;
>> >> >> > >
>> >> >> > >      if (usize == 0)                           /* GCD == ORIG_V.
>> >> >> > > */ goto done;
>> >> >> > >
>> >> >> > >      vp = (mp_ptr) TMP_ALLOC ((vsize + 2) * BYTES_PER_MP_LIMB);
>> >> >> > >      MPN_COPY (vp, orig_vp, vsize);
>> >> >> > >
>> >> >> > >      do                                        /* Main loop.  */
>> >> >> > >        {
>> >> >> > >          /* mpn_com_n can't be used here because anchor_up and up
>> >> >> > > may partially overlap */
>> >> >> > >          if ((up[usize - 1] & GMP_NUMB_HIGHBIT) != 0)  /* U < 0;
>> >> >> > > take twos' compl. */ {
>> >> >> > >              mp_size_t i;
>> >> >> > >              anchor_up[0] = -up[0] & GMP_NUMB_MASK;
>> >> >> > >              for (i = 1; i < usize; i++)
>> >> >> > >                anchor_up[i] = (~up[i] & GMP_NUMB_MASK);
>> >> >> > >              up = anchor_up;
>> >> >> > >            }
>> >> >> > >
>> >> >> > >          MPN_NORMALIZE_NOT_ZERO (up, usize);
>> >> >> > >
>> >> >> > >          if ((up[0] & 1) == 0)                 /* Result even;
>> >> >> > > remove twos. */ {
>> >> >> > >              unsigned int r;
>> >> >> > >              count_trailing_zeros (r, up[0]);
>> >> >> > >              mpn_rshift (anchor_up, up, usize, r);
>> >> >> > >              usize -= (anchor_up[usize - 1] == 0);
>> >> >> > >            }
>> >> >> > >          else if (anchor_up != up)
>> >> >> > >            MPN_COPY_INCR (anchor_up, up, usize);
>> >> >> > >
>> >> >> > >          MPN_PTR_SWAP (anchor_up,usize, vp,vsize);
>> >> >> > >          up = anchor_up;
>> >> >> > >
>> >> >> > >          if (vsize <= 2)               /* Kary can't handle < 2
>> >> >> > > limbs and */ break;                      /* isn't efficient for
>> >> >> > > == 2 limbs. */
>> >> >> > >
>> >> >> > >          d = vbitsize;
>> >> >> > >          count_leading_zeros (vbitsize, vp[vsize - 1]);
>> >> >> > >          vbitsize -= GMP_NAIL_BITS;
>> >> >> > >          vbitsize = vsize * GMP_NUMB_BITS - vbitsize;
>> >> >> > >          d = d - vbitsize + 1;
>> >> >> > >
>> >> >> > >          if (d > BMOD_THRESHOLD)       /* Bmod reduction.  */
>> >> >> > >            {
>> >> >> > >              up[usize++] = 0;
>> >> >> > >              mpn_bdivmod (up, up, usize, vp, vsize, d);
>> >> >> > >              d /= GMP_NUMB_BITS, up += d, usize -= d;
>> >> >> > >            }
>> >> >> > >          else                          /* Kary reduction.  */
>> >> >> > >            {
>> >> >> > >              mp_limb_t bp[2], cp[2];
>> >> >> > >
>> >> >> > >              /* C <-- V/U mod 2^(2*GMP_NUMB_BITS).  */
>> >> >> > >              {
>> >> >> > >                mp_limb_t u_inv, hi, lo;
>> >> >> > >                modlimb_invert (u_inv, up[0]);
>> >> >> > >                cp[0] = (vp[0] * u_inv) & GMP_NUMB_MASK;
>> >> >> > >                umul_ppmm (hi, lo, cp[0], up[0] << GMP_NAIL_BITS);
>> >> >> > >                lo >>= GMP_NAIL_BITS;
>> >> >> > >                cp[1] = (vp[1] - hi - cp[0] * up[1]) * u_inv &
>> >> >> > > GMP_NUMB_MASK; }
>> >> >> > >
>> >> >> > >              /* U <-- find_a (C)  *  U.  */
>> >> >> > >              up[usize] = mpn_mul_1 (up, up, usize, find_a (cp));
>> >> >> > >              usize++;
>> >> >> > >
>> >> >> > >              /* B <-- A/C == U/V mod 2^(GMP_NUMB_BITS + 1).
>> >> >> > >                  bp[0] <-- U/V mod 2^GMP_NUMB_BITS and
>> >> >> > >                  bp[1] <-- ( (U - bp[0] * V)/2^GMP_NUMB_BITS ) /
>> >> >> > > V mod 2
>> >> >> > >
>> >> >> > >                  Like V/U above, but simplified because only the
>> >> >> > > low bit of bp[1] is wanted. */
>> >> >> > >              {
>> >> >> > >                mp_limb_t  v_inv, hi, lo;
>> >> >> > >                modlimb_invert (v_inv, vp[0]);
>> >> >> > >                bp[0] = (up[0] * v_inv) & GMP_NUMB_MASK;
>> >> >> > >                umul_ppmm (hi, lo, bp[0], vp[0] << GMP_NAIL_BITS);
>> >> >> > >                lo >>= GMP_NAIL_BITS;
>> >> >> > >                bp[1] = (up[1] + hi + (bp[0] & vp[1])) & 1;
>> >> >> > >              }
>> >> >> > >
>> >> >> > >              up[usize++] = 0;
>> >> >> > >              if (bp[1] != 0)   /* B < 0: U <-- U + (-B)  * V.  */
>> >> >> > >                {
>> >> >> > >                   mp_limb_t c = mpn_addmul_1 (up, vp, vsize,
>> >> >> > > -bp[0] & GMP_NUMB_MASK); mpn_add_1 (up + vsize, up + vsize, usize
>> >> >> > > - vsize, c); } else              /* B >= 0:  U <-- U - B * V.  */
>> >> >> > > { mp_limb_t b = mpn_submul_1 (up, vp, vsize, bp[0]); mpn_sub_1
>> >> >> > > (up + vsize, up + vsize, usize - vsize, b); }
>> >> >> > >
>> >> >> > >              up += 2, usize -= 2;  /* At least two low limbs are
>> >> >> > > zero. */ }
>> >> >> > >
>> >> >> > >          /* Must remove low zero limbs before complementing.  */
>> >> >> > >          while (usize != 0 && up[0] == 0)
>> >> >> > >            up++, usize--;
>> >> >> > >        }
>> >> >> > >      while (usize != 0);
>> >> >> > >
>> >> >> > >      /* Compute GCD (ORIG_V, GCD (ORIG_U, V)).  Binary will
>> >> >> > > execute twice.  */ up = orig_up, usize = orig_usize;
>> >> >> > >      binary_gcd_ctr = 2;
>> >> >> > >    }
>> >> >> > >  else
>> >> >> > >    binary_gcd_ctr = 1;
>> >> >> > >
>> >> >> > >  scratch = MPN_NGCD_LEHMER_ITCH (vsize);
>> >> >> > >  if (usize + 1 > scratch)
>> >> >> > >    scratch = usize + 1;
>> >> >> > >
>> >> >> > >  tp = TMP_ALLOC_LIMBS (scratch);
>> >> >> > >
>> >> >> > >  /* Finish up with the binary algorithm.  Executes once or twice.
>> >> >> > > */ for ( ; binary_gcd_ctr--; up = orig_vp, usize = orig_vsize) {
>> >> >> > > #if 1
>> >> >> > >      if (usize > vsize)
>> >> >> > >        {
>> >> >> > >          /* FIXME: Could use mpn_bdivmod. */
>> >> >> > >          mp_size_t rsize;
>> >> >> > >
>> >> >> > >          mpn_tdiv_qr (tp + vsize, tp, 0, up, usize, vp, vsize);
>> >> >> > >          rsize = vsize;
>> >> >> > >          MPN_NORMALIZE (tp, rsize);
>> >> >> > >          if (rsize == 0)
>> >> >> > >            continue;
>> >> >> > >
>> >> >> > >          MPN_COPY (up, tp, vsize);
>> >> >> > >        }
>> >> >> > >      vsize = mpn_ngcd_lehmer (vp, up, vp, vsize, tp);
>> >> >> > > #else
>> >> >> > >      if (usize > 2)            /* First make U close to V in
>> >> >> > > size. */ {
>> >> >> > >          unsigned long int vbitsize, d;
>> >> >> > >          count_leading_zeros (d, up[usize - 1]);
>> >> >> > >          d -= GMP_NAIL_BITS;
>> >> >> > >          d = usize * GMP_NUMB_BITS - d;
>> >> >> > >          count_leading_zeros (vbitsize, vp[vsize - 1]);
>> >> >> > >          vbitsize -= GMP_NAIL_BITS;
>> >> >> > >          vbitsize = vsize * GMP_NUMB_BITS - vbitsize;
>> >> >> > >          d = d - vbitsize - 1;
>> >> >> > >          if (d != -(unsigned long int)1 && d > 2)
>> >> >> > >            {
>> >> >> > >              mpn_bdivmod (up, up, usize, vp, vsize, d);  /*
>> >> >> > > Result > 0. */ d /= (unsigned long int)GMP_NUMB_BITS, up += d,
>> >> >> > > usize -= d; } }
>> >> >> > >
>> >> >> > >      /* Start binary GCD.  */
>> >> >> > >      do
>> >> >> > >        {
>> >> >> > >          mp_size_t zeros;
>> >> >> > >
>> >> >> > >          /* Make sure U is odd.  */
>> >> >> > >          MPN_NORMALIZE (up, usize);
>> >> >> > >          while (up[0] == 0)
>> >> >> > >            up += 1, usize -= 1;
>> >> >> > >          if ((up[0] & 1) == 0)
>> >> >> > >            {
>> >> >> > >              unsigned int r;
>> >> >> > >              count_trailing_zeros (r, up[0]);
>> >> >> > >              mpn_rshift (up, up, usize, r);
>> >> >> > >              usize -= (up[usize - 1] == 0);
>> >> >> > >            }
>> >> >> > >
>> >> >> > >          /* Keep usize >= vsize.  */
>> >> >> > >          if (usize < vsize)
>> >> >> > >            MPN_PTR_SWAP (up, usize, vp, vsize);
>> >> >> > >
>> >> >> > >          if (usize <= 2)                               /* Double
>> >> >> > > precision. */ {
>> >> >> > >              if (vsize == 1)
>> >> >> > >                vp[0] = mpn_gcd_1 (up, usize, vp[0]);
>> >> >> > >              else
>> >> >> > >                vsize = gcd_2 (vp, up);
>> >> >> > >              break;                                    /* Binary
>> >> >> > > GCD done.  */ }
>> >> >> > >
>> >> >> > >          /* Count number of low zero limbs of U - V.  */
>> >> >> > >          for (zeros = 0; up[zeros] == vp[zeros] && ++zeros !=
>> >> >> > > vsize; ) continue;
>> >> >> > >
>> >> >> > >          /* If U < V, swap U and V; in any case, subtract V from
>> >> >> > > U. */ if (zeros == vsize)                           /* Subtract
>> >> >> > > done. */ up += zeros, usize -= zeros;
>> >> >> > >          else if (usize == vsize)
>> >> >> > >            {
>> >> >> > >              mp_size_t size = vsize;
>> >> >> > >              do
>> >> >> > >                size--;
>> >> >> > >              while (up[size] == vp[size]);
>> >> >> > >              if (up[size] < vp[size])                  /* usize
>> >> >> > > == vsize. */ MP_PTR_SWAP (up, vp);
>> >> >> > >              up += zeros, usize = size + 1 - zeros;
>> >> >> > >              mpn_sub_n (up, up, vp + zeros, usize);
>> >> >> > >            }
>> >> >> > >          else
>> >> >> > >            {
>> >> >> > >              mp_size_t size = vsize - zeros;
>> >> >> > >              up += zeros, usize -= zeros;
>> >> >> > >              if (mpn_sub_n (up, up, vp + zeros, size))
>> >> >> > >                {
>> >> >> > >                  while (up[size] == 0)                 /*
>> >> >> > > Propagate borrow. */ up[size++] = -(mp_limb_t)1;
>> >> >> > >                  up[size] -= 1;
>> >> >> > >                }
>> >> >> > >            }
>> >> >> > >        }
>> >> >> > >      while (usize);                                    /* End
>> >> >> > > binary GCD. */ #endif
>> >> >> > >    }
>> >> >> > >
>> >> >> > > done:
>> >> >> > >  if (vp != gp)
>> >> >> > >    MPN_COPY_INCR (gp, vp, vsize);
>> >> >> > >  TMP_FREE;
>> >> >> > >  return vsize;
>> >> >> > > }
>> >> >> > >
>> >> >> > >
>> >> >> > >
>> >> >> > > /*
>> >> >> > > *****************************************************************
>> >> >> > >* * END of original GMP mpn_gcd
>> >> >> > >  *
>> >> >> > > *****************************************************************
>> >> >> > >/
>> >> >> > >
>> >> >> > >
>> >> >> > >
>> >> >> > >
>> >> >> > >
>> >> >> > > /* For input of size n, matrix elements are of size at most
>> >> >> > > ceil(n/2) - 1, but we need one limb extra. */
>> >> >> > >
>> >> >> > > void
>> >> >> > > mpn_ngcd_matrix_init (struct ngcd_matrix *M, mp_size_t n, mp_ptr
>> >> >> > > p) {
>> >> >> > >  mp_size_t s = (n+1)/2;
>> >> >> > >  M->alloc = s;
>> >> >> > >  M->n = 1;
>> >> >> > >  MPN_ZERO (p, 4 * s);
>> >> >> > >  M->p[0][0] = p;
>> >> >> > >  M->p[0][1] = p + s;
>> >> >> > >  M->p[1][0] = p + 2 * s;
>> >> >> > >  M->p[1][1] = p + 3 * s;
>> >> >> > >  M->tp = p + 4*s;
>> >> >> > >
>> >> >> > >  M->p[0][0][0] = M->p[1][1][0] = 1;
>> >> >> > > }
>> >> >> > >
>> >> >> > > #define NHGCD_BASE_ITCH MPN_NGCD_STEP_ITCH
>> >> >> > >
>> >> >> > > /* Reduces a,b until |a-b| fits in n/2 + 1 limbs. Constructs
>> >> >> > > matrix M with elements of size at most (n+1)/2 - 1. Returns new
>> >> >> > > size of a, b, or zero if no reduction is possible. */
>> >> >> > > static mp_size_t
>> >> >> > > nhgcd_base (mp_ptr ap, mp_ptr bp, mp_size_t n,
>> >> >> > >            struct ngcd_matrix *M, mp_ptr tp)
>> >> >> > > {
>> >> >> > >  mp_size_t s = n/2 + 1;
>> >> >> > >  mp_size_t nn;
>> >> >> > >
>> >> >> > >  ASSERT (n > s);
>> >> >> > >  ASSERT (ap[n-1] > 0 || bp[n-1] > 0);
>> >> >> > >
>> >> >> > >  nn = mpn_ngcd_step (n, ap, bp, s, M, tp);
>> >> >> > >  if (!nn)
>> >> >> > >    return 0;
>> >> >> > >
>> >> >> > >  for (;;)
>> >> >> > >    {
>> >> >> > >      n = nn;
>> >> >> > >      ASSERT (n > s);
>> >> >> > >      nn = mpn_ngcd_step (n, ap, bp, s, M, tp);
>> >> >> > >      if (!nn )
>> >> >> > >        return n;
>> >> >> > >    }
>> >> >> > > }
>> >> >> > >
>> >> >> > > /* Size analysis for nhgcd:
>> >> >> > >
>> >> >> > >   For the recursive calls, we have n1 <= ceil(n / 2). Then the
>> >> >> > >   storage need is determined by the storage for the recursive
>> >> >> > > call computing M1, and ngcd_matrix_adjust and ngcd_matrix_mul
>> >> >> > > calls that use M1 (after this, the storage needed for M1 can be
>> >> >> > > recycled).
>> >> >> > >
>> >> >> > >   Let S(r) denote the required storage. For M1 we need 5 *
>> >> >> > > ceil(n1/2) = 5 * ceil(n/4), and for the ngcd_matrix_adjust call,
>> >> >> > > we need n + 2. In total, 5 * ceil(n/4) + n + 2 <= 9 ceil(n/4) +
>> >> >> > > 2.
>> >> >> > >
>> >> >> > >   For the recursive call, we need S(n1) = S(ceil(n/2)).
>> >> >> > >
>> >> >> > >   S(n) <= 9*ceil(n/4) + 2 + S(ceil(n/2))
>> >> >> > >        <= 9*(ceil(n/4) + ... + ceil(n/2^(1+k))) + 2k +
>> >> >> > > S(ceil(n/2^k)) <= 9*(2 ceil(n/4) + k) + 2k + S(n/2^k)
>> >> >> > >        <= 18 ceil(n/4) + 11k + S(n/2^k)
>> >> >> > >
>> >> >> > > */
>> >> >> > >
>> >> >> > > mp_size_t
>> >> >> > > mpn_nhgcd_itch (mp_size_t n)
>> >> >> > > {
>> >> >> > >  unsigned k;
>> >> >> > >  mp_size_t nn;
>> >> >> > >
>> >> >> > >  /* Inefficient way to almost compute
>> >> >> > >     log_2(n/NHGCD_BASE_THRESHOLD) */
>> >> >> > >  for (k = 0, nn = n;
>> >> >> > >       ABOVE_THRESHOLD (nn, NHGCD_THRESHOLD);
>> >> >> > >       nn = (nn + 1) / 2)
>> >> >> > >    k++;
>> >> >> > >
>> >> >> > >  if (k == 0)
>> >> >> > >    return NHGCD_BASE_ITCH (n);
>> >> >> > >
>> >> >> > >  return 18 * ((n+3) / 4) + 11 * k
>> >> >> > >    + NHGCD_BASE_ITCH (NHGCD_THRESHOLD);
>> >> >> > > }
>> >> >> > >
>> >> >> > > /* Reduces a,b until |a-b| fits in n/2 + 1 limbs. Constructs
>> >> >> > > matrix M with elements of size at most (n+1)/2 - 1. Returns new
>> >> >> > > size of a, b, or zero if no reduction is possible. */
>> >> >> > >
>> >> >> > > mp_size_t
>> >> >> > > mpn_nhgcd (mp_ptr ap, mp_ptr bp, mp_size_t n,
>> >> >> > >           struct ngcd_matrix *M, mp_ptr tp)
>> >> >> > > {
>> >> >> > >  mp_size_t s = n/2 + 1;
>> >> >> > >  mp_size_t n2 = (3*n)/4 + 1;
>> >> >> > >
>> >> >> > >  mp_size_t p, nn;
>> >> >> > >  unsigned count;
>> >> >> > >  int success = 0;
>> >> >> > >
>> >> >> > >  ASSERT (n > s);
>> >> >> > >  ASSERT ((ap[n-1] | bp[n-1]) > 0);
>> >> >> > >
>> >> >> > >  ASSERT ((n+1)/2 - 1 < M->alloc);
>> >> >> > >
>> >> >> > >  if (BELOW_THRESHOLD (n, NHGCD_THRESHOLD))
>> >> >> > >    return nhgcd_base (ap, bp, n, M, tp);
>> >> >> > >
>> >> >> > >  p = n/2;
>> >> >> > >  nn = mpn_nhgcd (ap + p, bp + p, n - p, M, tp);
>> >> >> > >  if (nn > 0)
>> >> >> > >    {
>> >> >> > >      /* Needs 2*(p + M->n) <= 2*(floor(n/2) + ceil(n/2) - 1)
>> >> >> > >         = 2 (n - 1) */
>> >> >> > >      n = mpn_ngcd_matrix_adjust (M, p + nn, ap, bp, p, tp);
>> >> >> > >      success = 1;
>> >> >> > >    }
>> >> >> > >  count = 0;
>> >> >> > >  while (n > n2)
>> >> >> > >    {
>> >> >> > >      count++;
>> >> >> > >      /* Needs n + 1 storage */
>> >> >> > >      nn = mpn_ngcd_step (n, ap, bp, s, M, tp);
>> >> >> > >      if (!nn)
>> >> >> > >        return success ? n : 0;
>> >> >> > >      n = nn;
>> >> >> > >      success = 1;
>> >> >> > >    }
>> >> >> > >
>> >> >> > >  if (n > s + 2)
>> >> >> > >    {
>> >> >> > >      struct ngcd_matrix M1;
>> >> >> > >      mp_size_t scratch;
>> >> >> > >
>> >> >> > >      p = 2*s - n + 1;
>> >> >> > >      scratch = MPN_NGCD_MATRIX_INIT_ITCH (n-p);
>> >> >> > >
>> >> >> > >      mpn_ngcd_matrix_init(&M1, n - p, tp);
>> >> >> > >      nn = mpn_nhgcd (ap + p, bp + p, n - p, &M1, tp + scratch);
>> >> >> > >      if (nn > 0)
>> >> >> > >        {
>> >> >> > >          /* Needs 2 (p + M->n) <= 2 (2*s - n2 + 1 + n2 - s - 1)
>> >> >> > >             = 2*s <= 2*(floor(n/2) + 1) <= n + 2. */
>> >> >> > >          n = mpn_ngcd_matrix_adjust (&M1, p + nn, ap, bp, p, tp +
>> >> >> > > scratch); /* Needs M->n <= n2 - s - 1 < n/4 */
>> >> >> > >          mpn_ngcd_matrix_mul (M, &M1, tp + scratch);
>> >> >> > >          success = 1;
>> >> >> > >        }
>> >> >> > >    }
>> >> >> > >
>> >> >> > >  /* FIXME: This really is the base case */
>> >> >> > >  for (count = 0;; count++)
>> >> >> > >    {
>> >> >> > >      /* Needs s+3 < n */
>> >> >> > >      nn = mpn_ngcd_step (n, ap, bp, s, M, tp);
>> >> >> > >      if (!nn)
>> >> >> > >        return success ? n : 0;
>> >> >> > >
>> >> >> > >      n = nn;
>> >> >> > >      success = 1;
>> >> >> > >    }
>> >> >> > > }
>> >> >> > >
>> >> >> > > #define EVEN_P(x) (((x) & 1) == 0)
>> >> >> > >
>> >> >> > > mp_size_t
>> >> >> > > mpn_gcd (mp_ptr gp, mp_ptr ap, mp_size_t an, mp_ptr bp, mp_size_t
>> >> >> > > n) {
>> >> >> > >  mp_size_t init_scratch;
>> >> >> > >  mp_size_t scratch;
>> >> >> > >  mp_ptr tp;
>> >> >> > >  TMP_DECL;
>> >> >> > >
>> >> >> > >  ASSERT (an >= n);
>> >> >> > >
>> >> >> > >  if (BELOW_THRESHOLD (n, NGCD_THRESHOLD))
>> >> >> > >    return mpn_basic_gcd (gp, ap, an, bp, n);
>> >> >> > >
>> >> >> > >  init_scratch = MPN_NGCD_MATRIX_INIT_ITCH ((n+1)/2);
>> >> >> > >  scratch = mpn_nhgcd_itch ((n+1)/2);
>> >> >> > >
>> >> >> > >  /* Space needed for mpn_ngcd_matrix_adjust */
>> >> >> > >  if (scratch < 2*n)
>> >> >> > >    scratch = 2*n;
>> >> >> > >
>> >> >> > >  TMP_MARK;
>> >> >> > >
>> >> >> > >  if (an + 1 > init_scratch + scratch)
>> >> >> > >    tp = TMP_ALLOC_LIMBS (an + 1);
>> >> >> > >  else
>> >> >> > >    tp = TMP_ALLOC_LIMBS (init_scratch + scratch);
>> >> >> > >
>> >> >> > >  if (an > n)
>> >> >> > >    {
>> >> >> > >      mp_ptr rp = tp;
>> >> >> > >      mp_ptr qp = rp + n;
>> >> >> > >
>> >> >> > >      mpn_tdiv_qr (qp, rp, 0, ap, an, bp, n);
>> >> >> > >      MPN_COPY (ap, rp, n);
>> >> >> > >      an = n;
>> >> >> > >      MPN_NORMALIZE (ap, an);
>> >> >> > >      if (an == 0)
>> >> >> > >        {
>> >> >> > >          MPN_COPY (gp, bp, n);
>> >> >> > >          TMP_FREE;
>> >> >> > >          return n;
>> >> >> > >        }
>> >> >> > >    }
>> >> >> > >
>> >> >> > >  while (ABOVE_THRESHOLD (n, NGCD_THRESHOLD))
>> >> >> > >    {
>> >> >> > >      struct ngcd_matrix M;
>> >> >> > >      mp_size_t p = n/2;
>> >> >> > >      mp_size_t nn;
>> >> >> > >
>> >> >> > >      mpn_ngcd_matrix_init (&M, n - p, tp);
>> >> >> > >      nn = mpn_nhgcd (ap + p, bp + p, n - p, &M, tp +
>> >> >> > > init_scratch); if (nn > 0)
>> >> >> > >        /* Needs 2*(p + M->n) <= 2*(floor(n/2) + ceil(n/2) - 1)
>> >> >> > >           = 2(n-1) */
>> >> >> > >        n = mpn_ngcd_matrix_adjust (&M, p + nn, ap, bp, p, tp +
>> >> >> > > init_scratch);
>> >> >> > >
>> >> >> > >      else
>> >> >> > >        {
>> >> >> > >          mp_size_t gn;
>> >> >> > >          n = mpn_ngcd_subdiv_step (gp, &gn, ap, bp, n, tp);
>> >> >> > >          if (n == 0)
>> >> >> > >            {
>> >> >> > >              TMP_FREE;
>> >> >> > >              return gn;
>> >> >> > >            }
>> >> >> > >        }
>> >> >> > >    }
>> >> >> > >
>> >> >> > >  ASSERT (ap[n-1] > 0 || bp[n-1] > 0);
>> >> >> > > #if 0
>> >> >> > >  /* FIXME: We may want to use lehmer on some systems. */
>> >> >> > >  n = mpn_ngcd_lehmer (gp, ap, bp, n, tp);
>> >> >> > >
>> >> >> > >  TMP_FREE;
>> >> >> > >  return n;
>> >> >> > > #endif
>> >> >> > >
>> >> >> > >  if (ap[n-1] < bp[n-1])
>> >> >> > >    MP_PTR_SWAP (ap, bp);
>> >> >> > >
>> >> >> > >  an = n;
>> >> >> > >  MPN_NORMALIZE (bp, n);
>> >> >> > >
>> >> >> > >  if (n == 0)
>> >> >> > >    {
>> >> >> > >      MPN_COPY (gp, ap, an);
>> >> >> > >      TMP_FREE;
>> >> >> > >      return an;
>> >> >> > >    }
>> >> >> > >
>> >> >> > >  if (EVEN_P (bp[0]))
>> >> >> > >    {
>> >> >> > >      /* Then a must be odd (since the calling convention implies
>> >> >> > > that there's no common factor of 2) */
>> >> >> > >      ASSERT (!EVEN_P (ap[0]));
>> >> >> > >
>> >> >> > >      while (bp[0] == 0)
>> >> >> > >        {
>> >> >> > >          bp++;
>> >> >> > >          n--;
>> >> >> > >        }
>> >> >> > >
>> >> >> > >      if (EVEN_P(bp[0]))
>> >> >> > >        {
>> >> >> > >          int count;
>> >> >> > >          count_trailing_zeros (count, bp[0]);
>> >> >> > >          ASSERT_NOCARRY (mpn_rshift (bp, bp, n, count));
>> >> >> > >          n -= (bp[n-1] == 0);
>> >> >> > >        }
>> >> >> > >    }
>> >> >> > >
>> >> >> > >  TMP_FREE;
>> >> >> > >  return mpn_basic_gcd (gp, ap, an, bp, n);
>> >> >> > > }
>> >> >> > >
>> >> >> > > /* Schönhage's 1987 algorithm, reorganized into hgcd form */
>> >> >> > >
>> >> >> > > #include <stdio.h>  /* for NULL */
>> >> >> > >
>> >> >> > > #include "gmp.h"
>> >> >> > > #include "gmp-impl.h"
>> >> >> > > #include "longlong.h"
>> >> >> > >
>> >> >> > >
>> >> >> > >
>> >> >> > >
>> >> >> > >
>> >> >> > >
>> >> >> > > /* For input of size n, matrix elements are of size at most
>> >> >> > > ceil(n/2) - 1, but we need one limb extra. */
>> >> >> > >
>> >> >> > > void
>> >> >> > > mpn_ngcd_matrix_init (struct ngcd_matrix *M, mp_size_t n, mp_ptr
>> >> >> > > p);
>> >> >> > >
>> >> >> > > #define NHGCD_BASE_ITCH MPN_NGCD_STEP_ITCH
>> >> >> > >
>> >> >> > > /* Reduces a,b until |a-b| fits in n/2 + 1 limbs. Constructs
>> >> >> > > matrix M with elements of size at most (n+1)/2 - 1. Returns new
>> >> >> > > size of a, b, or zero if no reduction is possible. */
>> >> >> > > static mp_size_t
>> >> >> > > nhgcd_base (mp_ptr ap, mp_ptr bp, mp_size_t n,
>> >> >> > >            struct ngcd_matrix *M, mp_ptr tp);
>> >> >> > >
>> >> >> > > /* Size analysis for nhgcd:
>> >> >> > >
>> >> >> > >   For the recursive calls, we have n1 <= ceil(n / 2). Then the
>> >> >> > >   storage need is determined by the storage for the recursive
>> >> >> > > call computing M1, and ngcd_matrix_adjust and ngcd_matrix_mul
>> >> >> > > calls that use M1 (after this, the storage needed for M1 can be
>> >> >> > > recycled).
>> >> >> > >
>> >> >> > >   Let S(r) denote the required storage. For M1 we need 5 *
>> >> >> > > ceil(n1/2) = 5 * ceil(n/4), and for the ngcd_matrix_adjust call,
>> >> >> > > we need n + 2. In total, 5 * ceil(n/4) + n + 2 <= 9 ceil(n/4) +
>> >> >> > > 2.
>> >> >> > >
>> >> >> > >   For the recursive call, we need S(n1) = S(ceil(n/2)).
>> >> >> > >
>> >> >> > >   S(n) <= 9*ceil(n/4) + 2 + S(ceil(n/2))
>> >> >> > >        <= 9*(ceil(n/4) + ... + ceil(n/2^(1+k))) + 2k +
>> >> >> > > S(ceil(n/2^k)) <= 9*(2 ceil(n/4) + k) + 2k + S(n/2^k)
>> >> >> > >        <= 18 ceil(n/4) + 11k + S(n/2^k)
>> >> >> > >
>> >> >> > > */
>> >> >> > >
>> >> >> > > mp_size_t
>> >> >> > > mpn_nhgcd_itch (mp_size_t n);
>> >> >> > >
>> >> >> > >
>> >> >> > > /* Reduces a,b until |a-b| fits in n/2 + 1 limbs. Constructs
>> >> >> > > matrix M with elements of size at most (n+1)/2 - 1. Returns new
>> >> >> > > size of a, b, or zero if no reduction is possible. */
>> >> >> > >
>> >> >> > > mp_size_t
>> >> >> > > mpn_nhgcd (mp_ptr ap, mp_ptr bp, mp_size_t n,
>> >> >> > >           struct ngcd_matrix *M, mp_ptr tp);
>> >> >> > >
>> >> >> > >
>> >> >> > > #define EVEN_P(x) (((x) & 1) == 0)
>> >> >> > >
>> >> >> > > mp_size_t
>> >> >> > > mpn_ngcd (mp_ptr gp, mp_ptr ap, mp_size_t an, mp_ptr bp,
>> >> >> > > mp_size_t n) {
>> >> >> > >  mp_size_t init_scratch;
>> >> >> > >  mp_size_t scratch;
>> >> >> > >  mp_ptr tp;
>> >> >> > >  TMP_DECL;
>> >> >> > >
>> >> >> > >  ASSERT (an >= n);
>> >> >> > >
>> >> >> > >  if (BELOW_THRESHOLD (n, NGCD_THRESHOLD))
>> >> >> > >    return mpn_basic_gcd (gp, ap, an, bp, n);
>> >> >> > >
>> >> >> > >  init_scratch = MPN_NGCD_MATRIX_INIT_ITCH ((n+1)/2);
>> >> >> > >  scratch = mpn_nhgcd_itch ((n+1)/2);
>> >> >> > >
>> >> >> > >  /* Space needed for mpn_ngcd_matrix_adjust */
>> >> >> > >  if (scratch < 2*n)
>> >> >> > >    scratch = 2*n;
>> >> >> > >
>> >> >> > >  TMP_MARK;
>> >> >> > >
>> >> >> > >  if (an + 1 > init_scratch + scratch)
>> >> >> > >    tp = TMP_ALLOC_LIMBS (an + 1);
>> >> >> > >  else
>> >> >> > >    tp = TMP_ALLOC_LIMBS (init_scratch + scratch);
>> >> >> > >
>> >> >> > >  if (an > n)
>> >> >> > >    {
>> >> >> > >      mp_ptr rp = tp;
>> >> >> > >      mp_ptr qp = rp + n;
>> >> >> > >
>> >> >> > >      mpn_tdiv_qr (qp, rp, 0, ap, an, bp, n);
>> >> >> > >      MPN_COPY (ap, rp, n);
>> >> >> > >      an = n;
>> >> >> > >      MPN_NORMALIZE (ap, an);
>> >> >> > >      if (an == 0)
>> >> >> > >        {
>> >> >> > >          MPN_COPY (gp, bp, n);
>> >> >> > >          TMP_FREE;
>> >> >> > >          return n;
>> >> >> > >        }
>> >> >> > >    }
>> >> >> > >
>> >> >> > >  while (ABOVE_THRESHOLD (n, NGCD_THRESHOLD))
>> >> >> > >    {
>> >> >> > >      struct ngcd_matrix M;
>> >> >> > >      mp_size_t p = n/2;
>> >> >> > >      mp_size_t nn;
>> >> >> > >
>> >> >> > >      mpn_ngcd_matrix_init (&M, n - p, tp);
>> >> >> > >      nn = mpn_nhgcd (ap + p, bp + p, n - p, &M, tp +
>> >> >> > > init_scratch); if (nn > 0)
>> >> >> > >        /* Needs 2*(p + M->n) <= 2*(floor(n/2) + ceil(n/2) - 1)
>> >> >> > >           = 2(n-1) */
>> >> >> > >        n = mpn_ngcd_matrix_adjust (&M, p + nn, ap, bp, p, tp +
>> >> >> > > init_scratch);
>> >> >> > >
>> >> >> > >      else
>> >> >> > >        {
>> >> >> > >          mp_size_t gn;
>> >> >> > >          n = mpn_ngcd_subdiv_step (gp, &gn, ap, bp, n, tp);
>> >> >> > >          if (n == 0)
>> >> >> > >            {
>> >> >> > >              TMP_FREE;
>> >> >> > >              return gn;
>> >> >> > >            }
>> >> >> > >        }
>> >> >> > >    }
>> >> >> > >
>> >> >> > >  ASSERT (ap[n-1] > 0 || bp[n-1] > 0);
>> >> >> > > #if 0
>> >> >> > >  /* FIXME: We may want to use lehmer on some systems. */
>> >> >> > >  n = mpn_ngcd_lehmer (gp, ap, bp, n, tp);
>> >> >> > >
>> >> >> > >  TMP_FREE;
>> >> >> > >  return n;
>> >> >> > > #endif
>> >> >> > >
>> >> >> > >  if (ap[n-1] < bp[n-1])
>> >> >> > >    MP_PTR_SWAP (ap, bp);
>> >> >> > >
>> >> >> > >  an = n;
>> >> >> > >  MPN_NORMALIZE (bp, n);
>> >> >> > >
>> >> >> > >  if (n == 0)
>> >> >> > >    {
>> >> >> > >      MPN_COPY (gp, ap, an);
>> >> >> > >      TMP_FREE;
>> >> >> > >      return an;
>> >> >> > >    }
>> >> >> > >
>> >> >> > >  if (EVEN_P (bp[0]))
>> >> >> > >    {
>> >> >> > >      /* Then a must be odd (since the calling convention implies
>> >> >> > > that there's no common factor of 2) */
>> >> >> > >      ASSERT (!EVEN_P (ap[0]));
>> >> >> > >
>> >> >> > >      while (bp[0] == 0)
>> >> >> > >        {
>> >> >> > >          bp++;
>> >> >> > >          n--;
>> >> >> > >        }
>> >> >> > >
>> >> >> > >      if (EVEN_P(bp[0]))
>> >> >> > >        {
>> >> >> > >          int count;
>> >> >> > >          count_trailing_zeros (count, bp[0]);
>> >> >> > >          ASSERT_NOCARRY (mpn_rshift (bp, bp, n, count));
>> >> >> > >          n -= (bp[n-1] == 0);
>> >> >> > >        }
>> >> >> > >    }
>> >> >> > >
>> >> >> > >  TMP_FREE;
>> >> >> > >  return mpn_basic_gcd (gp, ap, an, bp, n);
>> >> >> > > }
>>
>>
>
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to mpir-devel@googlegroups.com
To unsubscribe from this group, send email to 
mpir-devel+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en
-~----------~----~----~----~------~----~------~--~---

[mpir-devel] Re: Need some review with these files

Reply via email to