Some more justification for my patch

I wrote some code that replaces mpz_millerrabin with a static lookup and
ran it for a number of values.

Existing Code (with real mpz_millerrabin)
100, gap: 1533, 0.00164 seconds
200, gap: 2547, 0.00675 seconds
300, gap: 3757, 0.01161 seconds
500, gap: 3765, 0.02568 seconds
1000, gap: 20835, 0.67636 seconds
2000, gap: 23685, 6.03032 seconds
5000, gap: 58867, 167.65375 seconds

With is_prime_lookup (e.g. nearly instant mpz_millerrabin)
100, gap: 1533, 0.00018 seconds
200, gap: 2547, 0.00045 seconds
300, gap: 3757, 0.00087 seconds
500, gap: 3765, 0.00085 seconds
1000, gap: 27393, 0.00580 seconds
2000, gap: 27711, 0.00557 seconds
5000, gap: 74443, 0.01320 seconds

Code in nextprime_segment.patch (see previous message) with is_prime_lookup
100, gap: 1533, 0.00011 seconds
200, gap: 2547, 0.00015 seconds
300, gap: 3757, 0.00018 seconds
500, gap: 3765, 0.00023 seconds
1000, gap: 27393, 0.00061 seconds
2000, gap: 27711, 0.00070 seconds
5000, gap: 74443, 0.00198 seconds

So the overhead from the existing code is less than
6% at 200 bits
3% at 500 bits
1% at 1000 bits.

The new code improves this to
2% at 200 bits
1% at 500 bits
0.1% at 1000 bits

I'm going to use this program to tune a new prime_limit on the range
300-5000 bits which is harder to tune with tune/speed.


On Fri, Mar 6, 2020 at 1:55 PM Seth Troisi <brain...@gmail.com> wrote:

> I tried using a segmented sieve. this avoids multiplication or mod in the
> inner loop.
> It's an improvement and the code is easy to comprehend.
>
> $ ./tune/speed -s 100-300,500,1000 -f 1.1 mpz_nextprime -p mpz_nextprime
> $ hg import ~/Downloads/nextprime_segment.patch --no-commit
> $ make -j4; make -C tune/ speed -j4
> $ ./tune/speed -s 100-300,500,1000 -f 1.1 mpz_nextprime -P mpz_nextprime2
>
> $ pr -m -t mpz_nextprime.data mpz_nextprime2.data | awk '{ print
> $0,"\t"$4/$2 }'
> 100 3.658593750e-05    100     3.326367187e-05 0.909193
> 110 3.904296875e-05    110     3.598828125e-05 0.921761
> 121 4.675585937e-05    121     4.157031250e-05 0.889093
> 133 8.379882812e-05    133     7.977343750e-05 0.951964
> 146 9.650195312e-05    146     8.803710937e-05 0.912283
> 160 1.245546875e-04    160     1.032539062e-04 0.828985
> 176 1.269101562e-04    176     1.173007813e-04 0.924282
> 193 1.977187500e-04    193     1.846230469e-04 0.933766
> 212 2.087812500e-04    212     1.933378906e-04 0.926031
> 233 2.356660156e-04    233     2.218164063e-04 0.941232
> 256 3.030214844e-04    256     2.868691406e-04 0.946696
> 281 4.203593750e-04    281     3.908398437e-04 0.929775
> 500 2.132431641e-03    500     2.013937500e-03 0.944432
> 1000 2.307201758e-02    1000     2.284832031e-02 0.990304
>
> This can undoubtedly benefit from a larger sieve also,
> which I'm testing in a different patch.
>
> On Thu, Mar 5, 2020 at 6:05 PM Seth Troisi <brain...@gmail.com> wrote:
>
>> I tried using a segmented sieve. this avoids multiplication or mod in the
>> inner loop.
>> It's an improvement and the code is easy to comprehend.
>>
>>
>>
>>
>> On Wed, Feb 12, 2020 at 10:25 AM Niels Möller <ni...@lysator.liu.se>
>> wrote:
>>
>>> "Marco Bodrato" <bodr...@mail.dm.unipi.it> writes:
>>>
>>> > Ciao,
>>> >
>>> > Il Mer, 12 Febbraio 2020 7:26 am, Niels Möller ha scritto:
>>> >> And for a small prime gap (common case), this cost should be the main
>>> >> part of the sieving. So it would make sense to optimize it. If we also
>>> >> use the ptab, we could compute modulo single-limb prime products using
>>> >> precomputed constants.
>>> >
>>> > Yes, of course.
>>>
>>> Another version below, using ptab. Timing:
>>>
>>> $ ./tune/speed -s 100-300,500,1000 -f 1.1 mpz_nextprime
>>>         mpz_nextprime
>>> 100       0.000116762
>>> 110       0.000126770
>>> 121       0.000147203
>>> 133       0.000257994
>>> 146       0.000291051
>>> 160       0.000343168
>>> 176       0.000386605
>>> 193       0.000625994
>>> 212       0.000670035
>>> 233       0.000763240
>>> 256       0.000942154
>>> 281       0.001418639
>>> 500       0.006632588
>>> 1000      0.073935992
>>>
>>> So also close to 15% speedup for 100 bits, and the same as previous
>>> version, almost 25%, for 1000 bits.
>>>
>>> I used the very crude tuning
>>>
>>>   limit = (nbits < 200) ? nbits / 2 : PTAB_SIZE;
>>>
>>> to avoid regressing for smaller sizes.
>>>
>>> > And the next question will be: should we generate on the fly an even
>>> > larger table of primes when the required size grows beyond the
>>> > precomputed table?
>>>
>>> Don't know. I don't quite understand how large a table could be
>>> beneficial, but I guess it might make sense to at least limit it to size
>>> of L1 cache.
>>>
>>> Regards,
>>> /Niels
>>>
>>> /* mpz_nextprime(p,t) - compute the next prime > t and store that in p.
>>>
>>> Copyright 1999-2001, 2008, 2009, 2012 Free Software Foundation, Inc.
>>>
>>> Contributed to the GNU project by Niels Möller and Torbjorn Granlund.
>>>
>>> This file is part of the GNU MP Library.
>>>
>>> The GNU MP Library is free software; you can redistribute it and/or
>>> modify
>>> it under the terms of either:
>>>
>>>   * the GNU Lesser General Public License as published by the Free
>>>     Software Foundation; either version 3 of the License, or (at your
>>>     option) any later version.
>>>
>>> or
>>>
>>>   * the GNU General Public License as published by the Free Software
>>>     Foundation; either version 2 of the License, or (at your option) any
>>>     later version.
>>>
>>> or both in parallel, as here.
>>>
>>> The GNU MP Library is distributed in the hope that it will be useful, but
>>> WITHOUT ANY WARRANTY; without even the implied warranty of
>>> MERCHANTABILITY
>>> or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
>>> for more details.
>>>
>>> You should have received copies of the GNU General Public License and the
>>> GNU Lesser General Public License along with the GNU MP Library.  If not,
>>> see https://www.gnu.org/licenses/.  */
>>>
>>> #include "gmp-impl.h"
>>> #include "longlong.h"
>>>
>>> #if 1
>>> struct gmp_primes_dtab {
>>>   mp_limb_t p;
>>>   mp_limb_t binv;
>>>   mp_limb_t lim;
>>> };
>>>
>>> struct gmp_primes_ptab {
>>>   mp_limb_t ppp;        /* primes, multiplied together */
>>>   mp_limb_t cps[7];     /* ppp values pre-computed for mpn_mod_1s_4p */
>>>   gmp_uint_least32_t idx:24;    /* index of  first primes in dtab */
>>>   gmp_uint_least32_t np :8;     /* number of primes related to this
>>> entry */
>>> };
>>>
>>> static const struct gmp_primes_dtab dtab[] =
>>> {
>>> #define WANT_dtab
>>> #define P(p,inv,lim) {p, inv,lim}
>>> #include "trialdivtab.h"
>>> #undef WANT_dtab
>>> #undef P
>>> };
>>>
>>> #define DTAB_SIZE (sizeof (dtab) / sizeof (dtab[0]))
>>>
>>> static const struct gmp_primes_ptab ptab[] =
>>> {
>>> #define WANT_ptab
>>> #include "trialdivtab.h"
>>> #undef WANT_ptab
>>> };
>>>
>>> #define PTAB_SIZE (sizeof (ptab) / sizeof (ptab[0]))
>>>
>>> void
>>> mpz_nextprime (mpz_ptr p, mpz_srcptr n)
>>> {
>>>   mp_limb_t *moduli;
>>>   unsigned long difference;
>>>   int i;
>>>   unsigned prime_limit;
>>>   mp_bitcnt_t nbits;
>>>   unsigned incr;
>>>   TMP_DECL;
>>>
>>>   /* First handle tiny numbers */
>>>   if (mpz_cmp_ui (n, 2) < 0)
>>>     {
>>>       mpz_set_ui (p, 2);
>>>       return;
>>>     }
>>>   mpz_add_ui (p, n, 1);
>>>   mpz_setbit (p, 0);
>>>
>>>   if (mpz_cmp_ui (p, 7) <= 0)
>>>     return;
>>>
>>>   MPN_SIZEINBASE_2EXP(nbits, PTR(p), SIZ(p), 1);
>>>
>>>   if (nbits < 200)
>>>     {
>>>       prime_limit = nbits / 2;
>>>       if (SIZ(p) == 1)
>>>         {
>>>           /* If the prime list includes any prime >= p, do plain
>>>              search. */
>>>           mp_limb_t limb = PTR(p)[0];
>>>           unsigned last = ptab[prime_limit-1].idx +
>>> ptab[prime_limit-1].np - 1;
>>>           if (dtab[last].p >= limb)
>>>             {
>>>               while (dtab[last-1].p >= limb)
>>>                 last--;
>>>               PTR(p)[0] = dtab[last].p;
>>>               return;
>>>             }
>>>         }
>>>     }
>>>   else
>>>     prime_limit = PTAB_SIZE;
>>>
>>>   TMP_MARK;
>>>
>>>   /* Compute residues modulo small odd primes */
>>>   moduli = TMP_ALLOC_TYPE (prime_limit, mp_limb_t);
>>>
>>>   for (;;)
>>>     {
>>>       for (i = 0; i < prime_limit; i++)
>>>         {
>>>           mp_limb_t ppp = ptab[i].ppp;
>>>           const mp_limb_t *cps = ptab[i].cps;
>>>           moduli[i] = mpn_mod_1s_4p (PTR(p), SIZ(p), ppp << cps[1], cps);
>>>         }
>>>
>>> #define INCR_LIMIT 0x10000      /* deep science */
>>>
>>>       for (difference = incr = 0; incr < INCR_LIMIT; difference += 2)
>>>         {
>>>           /* First check divisibility based on prime list */
>>>           for (i = 0; i < prime_limit; i++)
>>>             {
>>>               /* FIXME: Ensure that this addition can't overflow */
>>>               mp_limb_t m = moduli[i] + incr;
>>>               int idx = ptab[i].idx;
>>>               int np = ptab[i].np;
>>>               int j;
>>>
>>>               for (j = 0; j < np; j++)
>>>                 if (m * dtab[idx+j].binv <= dtab[idx+j].lim)
>>>                   goto next;
>>>             }
>>>
>>>           mpz_add_ui (p, p, difference);
>>>           difference = 0;
>>>
>>>           /* Miller-Rabin test */
>>>           if (mpz_millerrabin (p, 25))
>>>             goto done;
>>>         next:;
>>>           incr += 2;
>>>         }
>>>       mpz_add_ui (p, p, difference);
>>>       difference = 0;
>>>     }
>>>  done:
>>>   TMP_FREE;
>>> }
>>> #else
>>> static const unsigned char primegap[] =
>>> {
>>>   2,2,4,2,4,2,4,6,2,6,4,2,4,6,6,2,6,4,2,6,4,6,8,4,2,4,2,4,14,4,6,
>>>   2,10,2,6,6,4,6,6,2,10,2,4,2,12,12,4,2,4,6,2,10,6,6,6,2,6,4,2,10,14,4,2,
>>>   4,14,6,10,2,4,6,8,6,6,4,6,8,4,8,10,2,10,2,6,4,6,8,4,2,4,12,8,4,8,4,6,
>>>
>>> 12,2,18,6,10,6,6,2,6,10,6,6,2,6,6,4,2,12,10,2,4,6,6,2,12,4,6,8,10,8,10,8,
>>>
>>> 6,6,4,8,6,4,8,4,14,10,12,2,10,2,4,2,10,14,4,2,4,14,4,2,4,20,4,8,10,8,4,6,
>>>   6,14,4,6,6,8,6,12
>>> };
>>>
>>> #define NUMBER_OF_PRIMES 167
>>>
>>> void
>>> mpz_nextprime (mpz_ptr p, mpz_srcptr n)
>>> {
>>>   unsigned short *moduli;
>>>   unsigned long difference;
>>>   int i;
>>>   unsigned prime_limit;
>>>   unsigned long prime;
>>>   mp_size_t pn;
>>>   mp_bitcnt_t nbits;
>>>   unsigned incr;
>>>   TMP_SDECL;
>>>
>>>   /* First handle tiny numbers */
>>>   if (mpz_cmp_ui (n, 2) < 0)
>>>     {
>>>       mpz_set_ui (p, 2);
>>>       return;
>>>     }
>>>   mpz_add_ui (p, n, 1);
>>>   mpz_setbit (p, 0);
>>>
>>>   if (mpz_cmp_ui (p, 7) <= 0)
>>>     return;
>>>
>>>   pn = SIZ(p);
>>>   MPN_SIZEINBASE_2EXP(nbits, PTR(p), pn, 1);
>>>   if (nbits / 2 >= NUMBER_OF_PRIMES)
>>>     prime_limit = NUMBER_OF_PRIMES - 1;
>>>   else
>>>     prime_limit = nbits / 2;
>>>
>>>   TMP_SMARK;
>>>
>>>   /* Compute residues modulo small odd primes */
>>>   moduli = TMP_SALLOC_TYPE (prime_limit, unsigned short);
>>>
>>>   for (;;)
>>>     {
>>>       /* FIXME: Compute lazily? */
>>>       prime = 3;
>>>       for (i = 0; i < prime_limit; i++)
>>>         {
>>>           moduli[i] = mpz_tdiv_ui (p, prime);
>>>           prime += primegap[i];
>>>         }
>>>
>>> #define INCR_LIMIT 0x10000      /* deep science */
>>>
>>>       for (difference = incr = 0; incr < INCR_LIMIT; difference += 2)
>>>         {
>>>           /* First check residues */
>>>           prime = 3;
>>>           for (i = 0; i < prime_limit; i++)
>>>             {
>>>               unsigned r;
>>>               /* FIXME: Reduce moduli + incr and store back, to allow for
>>>                  division-free reductions.  Alternatively, table
>>> primes[]'s
>>>                  inverses (mod 2^16).  */
>>>               r = (moduli[i] + incr) % prime;
>>>               prime += primegap[i];
>>>
>>>               if (r == 0)
>>>                 goto next;
>>>             }
>>>
>>>           mpz_add_ui (p, p, difference);
>>>           difference = 0;
>>>
>>>           /* Miller-Rabin test */
>>>           if (mpz_millerrabin (p, 25))
>>>             goto done;
>>>         next:;
>>>           incr += 2;
>>>         }
>>>       mpz_add_ui (p, p, difference);
>>>       difference = 0;
>>>     }
>>>  done:
>>>   TMP_SFREE;
>>> }
>>>
>>> #endif
>>>
>>> --
>>> Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
>>> Internet email is subject to wholesale government surveillance.
>>>
>>
diff -r bcca14c8a090 mpz/nextprime.c
--- a/mpz/nextprime.c	Thu Mar 05 00:43:26 2020 +0100
+++ b/mpz/nextprime.c	Fri Mar 06 15:52:15 2020 -0800
@@ -33,6 +33,50 @@
 #include "gmp-impl.h"
 #include "longlong.h"
 
+// Hack while testing speed of wrapper
+
+int is_prime[200] = {
+976371562, 976371616, 976371732, 976371810, 976371862,
+976371912, 976371928, 976372096, 976372182, 976372206,
+976372222, 976372240, 976372302, 976372380, 976372416,
+976372422, 976372498, 976372632, 976372762, 976372818,
+499445307, 499445769, 499446099, 499446327, 499446357,
+499446537, 499446575, 499446599, 499446665, 499446785,
+499446795, 499446813, 499446845, 499446899, 499446989,
+499447085, 499447149, 499447235, 499447389, 499447619,
+322050916, 322051144, 322051390, 322051810, 322051830,
+322051872, 322052014, 322052160, 322052484, 322052532,
+322052700, 322052806, 322052850, 322052950, 322053016,
+322053214, 322053876, 322054030, 322054306, 322054516,
+390483062, 390483142, 390483328, 390483560, 390483604,
+390484082, 390484280, 390484624, 390484828, 390484918,
+390484952, 390485044, 390485204, 390485710, 390485882,
+390485900, 390486074, 390486164, 390486434, 390486772,
+688423507, 688429685, 688430003, 688433431, 688433837,
+688434047, 688434083, 688438613, 688439131, 688439215,
+688441481, 688442341, 688443121, 688444045, 688447163,
+688448081, 688449137, 688449391, 688449541, 688450603,
+749219357, 749219441, 749223757, 749224489, 749226371,
+749229451, 749230151, 749230903, 749231839, 749231863,
+749232413, 749232641, 749233363, 749233633, 749234701,
+749242201, 749242717, 749245849, 749246017, 749246227,
+860194732, 860194900, 860195722, 860201212, 860207378,
+860208208, 860208476, 860216788, 860217086, 860218496,
+860219848, 860225012, 860236540, 860249998, 860251442,
+860263046, 860263790, 860264198, 860264780, 860267018,
+};
+
+int
+mpz_is_prime_lookup(mpz_ptr p) {
+  int lookup = mpz_fdiv_ui(p, 1000000007);
+  for (int i = 0; i < 160; i++)
+    if (is_prime[i] == lookup) {
+      return 1;
+    }
+  return 0;
+}
+
+
 static const unsigned char primegap[] =
 {
   2,2,4,2,4,2,4,6,2,6,4,2,4,6,6,2,6,4,2,6,4,6,8,4,2,4,2,4,14,4,6,
@@ -115,7 +159,8 @@
 	  difference = 0;
 
 	  /* Miller-Rabin test */
-	  if (mpz_millerrabin (p, 25))
+//	  if (mpz_millerrabin (p, 25))
+	  if (mpz_is_prime_lookup(p))
 	    goto done;
 	next:;
 	  incr += 2;
diff -r bcca14c8a090 tests/mpz/t-nextprime.c
--- a/tests/mpz/t-nextprime.c	Thu Mar 05 00:43:26 2020 +0100
+++ b/tests/mpz/t-nextprime.c	Fri Mar 06 15:52:15 2020 -0800
@@ -20,6 +20,7 @@
 
 #include <stdio.h>
 #include <stdlib.h>
+#include <sys/time.h>
 
 #include "gmp-impl.h"
 #include "tests.h"
@@ -97,8 +98,10 @@
 
   for (i = 0; i < reps; i++)
     {
+      gmp_printf("{\"%Zd\": ", x);
       mpz_nextprime (y, x);
       mpz_sub (x, y, x);
+      gmp_printf("%d},\n", mpz_get_ui(x));
       if (diffs != NULL &&
 	  (! mpz_fits_sshort_p (x) || diffs[i] != (short) mpz_get_ui (x)))
 	{
@@ -170,24 +173,59 @@
 
   rands = RANDS;
   TESTS_REPS (reps, argv, argc);
+    
+  struct timeval tval_before, tval_after, tval_result;
 
-  test_ref(rands, reps);
+  mpz_t n, m, d;
+  mpz_init(n);
+  mpz_init(m);
+  mpz_init(d);
 
-  run ("2", 1000, "0x1ef7", diff1);
+  int sizes[] = {100, 200, 300, 500, 1000, 2000, 5000};
+  for (int i = 0; i < sizeof(sizes)/sizeof(sizes[0]); i++) {
+    int size = sizes[i];
+
+    mpz_ui_pow_ui(n, 2, size);
+    mpz_set(m, n);
 
-  run ("3", 1000 - 1, "0x1ef7", NULL);
+    gettimeofday(&tval_before, NULL);
+
+    for (int t = 0; t < 20; t++) {
+      mpz_nextprime(n, n);
+      mpz_sub(d, n, m);
+    //  gmp_printf("%d,\n", mpz_fdiv_ui(n, 1000000007)); // Poor man's hash
+    }
 
-  run ("0x8a43866f5776ccd5b02186e90d28946aeb0ed914", 50,
-       "0x8a43866f5776ccd5b02186e90d28946aeb0eeec5", diff3);
+    gettimeofday(&tval_after, NULL);
+    timersub(&tval_after, &tval_before, &tval_result);
+    float seconds = tval_result.tv_sec;
+    seconds += tval_result.tv_usec / 1e6;
 
-  run ("0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF6C", 50, /* 2^148 - 148 */
-       "0x100000000000000000000000000000000010ab", diff4);
+    mpz_sub(m, n, m);
+    gmp_printf("%d, gap: %Zd, %.5f seconds\n", size, m, seconds);
+  }
+
+  mpz_clear(n);
+  mpz_clear(m);
+  mpz_clear(d);
+
+//  test_ref(rands, reps);
 
-  run ("0x1c2c26be55317530311facb648ea06b359b969715db83292ab8cf898d8b1b", 50,
-       "0x1c2c26be55317530311facb648ea06b359b969715db83292ab8cf898da957", diff5);
-
-  run ("0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF80", 50, /* 2^128 - 128 */
-       "0x10000000000000000000000000000155B", diff6);
+//  run ("2", 1000, "0x1ef7", diff1);
+//
+//  run ("3", 1000 - 1, "0x1ef7", NULL);
+//
+//  run ("0x8a43866f5776ccd5b02186e90d28946aeb0ed914", 50,
+//       "0x8a43866f5776ccd5b02186e90d28946aeb0eeec5", diff3);
+//
+//  run ("0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF6C", 50, /* 2^148 - 148 */
+//       "0x100000000000000000000000000000000010ab", diff4);
+//
+//  run ("0x1c2c26be55317530311facb648ea06b359b969715db83292ab8cf898d8b1b", 50,
+//       "0x1c2c26be55317530311facb648ea06b359b969715db83292ab8cf898da957", diff5);
+//
+//  run ("0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF80", 50, /* 2^128 - 128 */
+//       "0x10000000000000000000000000000155B", diff6);
 
   // Too slow to include in normal testing.
   //test_largegaps ();
_______________________________________________
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Reply via email to