Some more justification for my patch I wrote some code that replaces mpz_millerrabin with a static lookup and ran it for a number of values.
Existing Code (with real mpz_millerrabin) 100, gap: 1533, 0.00164 seconds 200, gap: 2547, 0.00675 seconds 300, gap: 3757, 0.01161 seconds 500, gap: 3765, 0.02568 seconds 1000, gap: 20835, 0.67636 seconds 2000, gap: 23685, 6.03032 seconds 5000, gap: 58867, 167.65375 seconds With is_prime_lookup (e.g. nearly instant mpz_millerrabin) 100, gap: 1533, 0.00018 seconds 200, gap: 2547, 0.00045 seconds 300, gap: 3757, 0.00087 seconds 500, gap: 3765, 0.00085 seconds 1000, gap: 27393, 0.00580 seconds 2000, gap: 27711, 0.00557 seconds 5000, gap: 74443, 0.01320 seconds Code in nextprime_segment.patch (see previous message) with is_prime_lookup 100, gap: 1533, 0.00011 seconds 200, gap: 2547, 0.00015 seconds 300, gap: 3757, 0.00018 seconds 500, gap: 3765, 0.00023 seconds 1000, gap: 27393, 0.00061 seconds 2000, gap: 27711, 0.00070 seconds 5000, gap: 74443, 0.00198 seconds So the overhead from the existing code is less than 6% at 200 bits 3% at 500 bits 1% at 1000 bits. The new code improves this to 2% at 200 bits 1% at 500 bits 0.1% at 1000 bits I'm going to use this program to tune a new prime_limit on the range 300-5000 bits which is harder to tune with tune/speed. On Fri, Mar 6, 2020 at 1:55 PM Seth Troisi <brain...@gmail.com> wrote: > I tried using a segmented sieve. this avoids multiplication or mod in the > inner loop. > It's an improvement and the code is easy to comprehend. > > $ ./tune/speed -s 100-300,500,1000 -f 1.1 mpz_nextprime -p mpz_nextprime > $ hg import ~/Downloads/nextprime_segment.patch --no-commit > $ make -j4; make -C tune/ speed -j4 > $ ./tune/speed -s 100-300,500,1000 -f 1.1 mpz_nextprime -P mpz_nextprime2 > > $ pr -m -t mpz_nextprime.data mpz_nextprime2.data | awk '{ print > $0,"\t"$4/$2 }' > 100 3.658593750e-05 100 3.326367187e-05 0.909193 > 110 3.904296875e-05 110 3.598828125e-05 0.921761 > 121 4.675585937e-05 121 4.157031250e-05 0.889093 > 133 8.379882812e-05 133 7.977343750e-05 0.951964 > 146 9.650195312e-05 146 8.803710937e-05 0.912283 > 160 1.245546875e-04 160 1.032539062e-04 0.828985 > 176 1.269101562e-04 176 1.173007813e-04 0.924282 > 193 1.977187500e-04 193 1.846230469e-04 0.933766 > 212 2.087812500e-04 212 1.933378906e-04 0.926031 > 233 2.356660156e-04 233 2.218164063e-04 0.941232 > 256 3.030214844e-04 256 2.868691406e-04 0.946696 > 281 4.203593750e-04 281 3.908398437e-04 0.929775 > 500 2.132431641e-03 500 2.013937500e-03 0.944432 > 1000 2.307201758e-02 1000 2.284832031e-02 0.990304 > > This can undoubtedly benefit from a larger sieve also, > which I'm testing in a different patch. > > On Thu, Mar 5, 2020 at 6:05 PM Seth Troisi <brain...@gmail.com> wrote: > >> I tried using a segmented sieve. this avoids multiplication or mod in the >> inner loop. >> It's an improvement and the code is easy to comprehend. >> >> >> >> >> On Wed, Feb 12, 2020 at 10:25 AM Niels Möller <ni...@lysator.liu.se> >> wrote: >> >>> "Marco Bodrato" <bodr...@mail.dm.unipi.it> writes: >>> >>> > Ciao, >>> > >>> > Il Mer, 12 Febbraio 2020 7:26 am, Niels Möller ha scritto: >>> >> And for a small prime gap (common case), this cost should be the main >>> >> part of the sieving. So it would make sense to optimize it. If we also >>> >> use the ptab, we could compute modulo single-limb prime products using >>> >> precomputed constants. >>> > >>> > Yes, of course. >>> >>> Another version below, using ptab. Timing: >>> >>> $ ./tune/speed -s 100-300,500,1000 -f 1.1 mpz_nextprime >>> mpz_nextprime >>> 100 0.000116762 >>> 110 0.000126770 >>> 121 0.000147203 >>> 133 0.000257994 >>> 146 0.000291051 >>> 160 0.000343168 >>> 176 0.000386605 >>> 193 0.000625994 >>> 212 0.000670035 >>> 233 0.000763240 >>> 256 0.000942154 >>> 281 0.001418639 >>> 500 0.006632588 >>> 1000 0.073935992 >>> >>> So also close to 15% speedup for 100 bits, and the same as previous >>> version, almost 25%, for 1000 bits. >>> >>> I used the very crude tuning >>> >>> limit = (nbits < 200) ? nbits / 2 : PTAB_SIZE; >>> >>> to avoid regressing for smaller sizes. >>> >>> > And the next question will be: should we generate on the fly an even >>> > larger table of primes when the required size grows beyond the >>> > precomputed table? >>> >>> Don't know. I don't quite understand how large a table could be >>> beneficial, but I guess it might make sense to at least limit it to size >>> of L1 cache. >>> >>> Regards, >>> /Niels >>> >>> /* mpz_nextprime(p,t) - compute the next prime > t and store that in p. >>> >>> Copyright 1999-2001, 2008, 2009, 2012 Free Software Foundation, Inc. >>> >>> Contributed to the GNU project by Niels Möller and Torbjorn Granlund. >>> >>> This file is part of the GNU MP Library. >>> >>> The GNU MP Library is free software; you can redistribute it and/or >>> modify >>> it under the terms of either: >>> >>> * the GNU Lesser General Public License as published by the Free >>> Software Foundation; either version 3 of the License, or (at your >>> option) any later version. >>> >>> or >>> >>> * the GNU General Public License as published by the Free Software >>> Foundation; either version 2 of the License, or (at your option) any >>> later version. >>> >>> or both in parallel, as here. >>> >>> The GNU MP Library is distributed in the hope that it will be useful, but >>> WITHOUT ANY WARRANTY; without even the implied warranty of >>> MERCHANTABILITY >>> or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License >>> for more details. >>> >>> You should have received copies of the GNU General Public License and the >>> GNU Lesser General Public License along with the GNU MP Library. If not, >>> see https://www.gnu.org/licenses/. */ >>> >>> #include "gmp-impl.h" >>> #include "longlong.h" >>> >>> #if 1 >>> struct gmp_primes_dtab { >>> mp_limb_t p; >>> mp_limb_t binv; >>> mp_limb_t lim; >>> }; >>> >>> struct gmp_primes_ptab { >>> mp_limb_t ppp; /* primes, multiplied together */ >>> mp_limb_t cps[7]; /* ppp values pre-computed for mpn_mod_1s_4p */ >>> gmp_uint_least32_t idx:24; /* index of first primes in dtab */ >>> gmp_uint_least32_t np :8; /* number of primes related to this >>> entry */ >>> }; >>> >>> static const struct gmp_primes_dtab dtab[] = >>> { >>> #define WANT_dtab >>> #define P(p,inv,lim) {p, inv,lim} >>> #include "trialdivtab.h" >>> #undef WANT_dtab >>> #undef P >>> }; >>> >>> #define DTAB_SIZE (sizeof (dtab) / sizeof (dtab[0])) >>> >>> static const struct gmp_primes_ptab ptab[] = >>> { >>> #define WANT_ptab >>> #include "trialdivtab.h" >>> #undef WANT_ptab >>> }; >>> >>> #define PTAB_SIZE (sizeof (ptab) / sizeof (ptab[0])) >>> >>> void >>> mpz_nextprime (mpz_ptr p, mpz_srcptr n) >>> { >>> mp_limb_t *moduli; >>> unsigned long difference; >>> int i; >>> unsigned prime_limit; >>> mp_bitcnt_t nbits; >>> unsigned incr; >>> TMP_DECL; >>> >>> /* First handle tiny numbers */ >>> if (mpz_cmp_ui (n, 2) < 0) >>> { >>> mpz_set_ui (p, 2); >>> return; >>> } >>> mpz_add_ui (p, n, 1); >>> mpz_setbit (p, 0); >>> >>> if (mpz_cmp_ui (p, 7) <= 0) >>> return; >>> >>> MPN_SIZEINBASE_2EXP(nbits, PTR(p), SIZ(p), 1); >>> >>> if (nbits < 200) >>> { >>> prime_limit = nbits / 2; >>> if (SIZ(p) == 1) >>> { >>> /* If the prime list includes any prime >= p, do plain >>> search. */ >>> mp_limb_t limb = PTR(p)[0]; >>> unsigned last = ptab[prime_limit-1].idx + >>> ptab[prime_limit-1].np - 1; >>> if (dtab[last].p >= limb) >>> { >>> while (dtab[last-1].p >= limb) >>> last--; >>> PTR(p)[0] = dtab[last].p; >>> return; >>> } >>> } >>> } >>> else >>> prime_limit = PTAB_SIZE; >>> >>> TMP_MARK; >>> >>> /* Compute residues modulo small odd primes */ >>> moduli = TMP_ALLOC_TYPE (prime_limit, mp_limb_t); >>> >>> for (;;) >>> { >>> for (i = 0; i < prime_limit; i++) >>> { >>> mp_limb_t ppp = ptab[i].ppp; >>> const mp_limb_t *cps = ptab[i].cps; >>> moduli[i] = mpn_mod_1s_4p (PTR(p), SIZ(p), ppp << cps[1], cps); >>> } >>> >>> #define INCR_LIMIT 0x10000 /* deep science */ >>> >>> for (difference = incr = 0; incr < INCR_LIMIT; difference += 2) >>> { >>> /* First check divisibility based on prime list */ >>> for (i = 0; i < prime_limit; i++) >>> { >>> /* FIXME: Ensure that this addition can't overflow */ >>> mp_limb_t m = moduli[i] + incr; >>> int idx = ptab[i].idx; >>> int np = ptab[i].np; >>> int j; >>> >>> for (j = 0; j < np; j++) >>> if (m * dtab[idx+j].binv <= dtab[idx+j].lim) >>> goto next; >>> } >>> >>> mpz_add_ui (p, p, difference); >>> difference = 0; >>> >>> /* Miller-Rabin test */ >>> if (mpz_millerrabin (p, 25)) >>> goto done; >>> next:; >>> incr += 2; >>> } >>> mpz_add_ui (p, p, difference); >>> difference = 0; >>> } >>> done: >>> TMP_FREE; >>> } >>> #else >>> static const unsigned char primegap[] = >>> { >>> 2,2,4,2,4,2,4,6,2,6,4,2,4,6,6,2,6,4,2,6,4,6,8,4,2,4,2,4,14,4,6, >>> 2,10,2,6,6,4,6,6,2,10,2,4,2,12,12,4,2,4,6,2,10,6,6,6,2,6,4,2,10,14,4,2, >>> 4,14,6,10,2,4,6,8,6,6,4,6,8,4,8,10,2,10,2,6,4,6,8,4,2,4,12,8,4,8,4,6, >>> >>> 12,2,18,6,10,6,6,2,6,10,6,6,2,6,6,4,2,12,10,2,4,6,6,2,12,4,6,8,10,8,10,8, >>> >>> 6,6,4,8,6,4,8,4,14,10,12,2,10,2,4,2,10,14,4,2,4,14,4,2,4,20,4,8,10,8,4,6, >>> 6,14,4,6,6,8,6,12 >>> }; >>> >>> #define NUMBER_OF_PRIMES 167 >>> >>> void >>> mpz_nextprime (mpz_ptr p, mpz_srcptr n) >>> { >>> unsigned short *moduli; >>> unsigned long difference; >>> int i; >>> unsigned prime_limit; >>> unsigned long prime; >>> mp_size_t pn; >>> mp_bitcnt_t nbits; >>> unsigned incr; >>> TMP_SDECL; >>> >>> /* First handle tiny numbers */ >>> if (mpz_cmp_ui (n, 2) < 0) >>> { >>> mpz_set_ui (p, 2); >>> return; >>> } >>> mpz_add_ui (p, n, 1); >>> mpz_setbit (p, 0); >>> >>> if (mpz_cmp_ui (p, 7) <= 0) >>> return; >>> >>> pn = SIZ(p); >>> MPN_SIZEINBASE_2EXP(nbits, PTR(p), pn, 1); >>> if (nbits / 2 >= NUMBER_OF_PRIMES) >>> prime_limit = NUMBER_OF_PRIMES - 1; >>> else >>> prime_limit = nbits / 2; >>> >>> TMP_SMARK; >>> >>> /* Compute residues modulo small odd primes */ >>> moduli = TMP_SALLOC_TYPE (prime_limit, unsigned short); >>> >>> for (;;) >>> { >>> /* FIXME: Compute lazily? */ >>> prime = 3; >>> for (i = 0; i < prime_limit; i++) >>> { >>> moduli[i] = mpz_tdiv_ui (p, prime); >>> prime += primegap[i]; >>> } >>> >>> #define INCR_LIMIT 0x10000 /* deep science */ >>> >>> for (difference = incr = 0; incr < INCR_LIMIT; difference += 2) >>> { >>> /* First check residues */ >>> prime = 3; >>> for (i = 0; i < prime_limit; i++) >>> { >>> unsigned r; >>> /* FIXME: Reduce moduli + incr and store back, to allow for >>> division-free reductions. Alternatively, table >>> primes[]'s >>> inverses (mod 2^16). */ >>> r = (moduli[i] + incr) % prime; >>> prime += primegap[i]; >>> >>> if (r == 0) >>> goto next; >>> } >>> >>> mpz_add_ui (p, p, difference); >>> difference = 0; >>> >>> /* Miller-Rabin test */ >>> if (mpz_millerrabin (p, 25)) >>> goto done; >>> next:; >>> incr += 2; >>> } >>> mpz_add_ui (p, p, difference); >>> difference = 0; >>> } >>> done: >>> TMP_SFREE; >>> } >>> >>> #endif >>> >>> -- >>> Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. >>> Internet email is subject to wholesale government surveillance. >>> >>
diff -r bcca14c8a090 mpz/nextprime.c --- a/mpz/nextprime.c Thu Mar 05 00:43:26 2020 +0100 +++ b/mpz/nextprime.c Fri Mar 06 15:52:15 2020 -0800 @@ -33,6 +33,50 @@ #include "gmp-impl.h" #include "longlong.h" +// Hack while testing speed of wrapper + +int is_prime[200] = { +976371562, 976371616, 976371732, 976371810, 976371862, +976371912, 976371928, 976372096, 976372182, 976372206, +976372222, 976372240, 976372302, 976372380, 976372416, +976372422, 976372498, 976372632, 976372762, 976372818, +499445307, 499445769, 499446099, 499446327, 499446357, +499446537, 499446575, 499446599, 499446665, 499446785, +499446795, 499446813, 499446845, 499446899, 499446989, +499447085, 499447149, 499447235, 499447389, 499447619, +322050916, 322051144, 322051390, 322051810, 322051830, +322051872, 322052014, 322052160, 322052484, 322052532, +322052700, 322052806, 322052850, 322052950, 322053016, +322053214, 322053876, 322054030, 322054306, 322054516, +390483062, 390483142, 390483328, 390483560, 390483604, +390484082, 390484280, 390484624, 390484828, 390484918, +390484952, 390485044, 390485204, 390485710, 390485882, +390485900, 390486074, 390486164, 390486434, 390486772, +688423507, 688429685, 688430003, 688433431, 688433837, +688434047, 688434083, 688438613, 688439131, 688439215, +688441481, 688442341, 688443121, 688444045, 688447163, +688448081, 688449137, 688449391, 688449541, 688450603, +749219357, 749219441, 749223757, 749224489, 749226371, +749229451, 749230151, 749230903, 749231839, 749231863, +749232413, 749232641, 749233363, 749233633, 749234701, +749242201, 749242717, 749245849, 749246017, 749246227, +860194732, 860194900, 860195722, 860201212, 860207378, +860208208, 860208476, 860216788, 860217086, 860218496, +860219848, 860225012, 860236540, 860249998, 860251442, +860263046, 860263790, 860264198, 860264780, 860267018, +}; + +int +mpz_is_prime_lookup(mpz_ptr p) { + int lookup = mpz_fdiv_ui(p, 1000000007); + for (int i = 0; i < 160; i++) + if (is_prime[i] == lookup) { + return 1; + } + return 0; +} + + static const unsigned char primegap[] = { 2,2,4,2,4,2,4,6,2,6,4,2,4,6,6,2,6,4,2,6,4,6,8,4,2,4,2,4,14,4,6, @@ -115,7 +159,8 @@ difference = 0; /* Miller-Rabin test */ - if (mpz_millerrabin (p, 25)) +// if (mpz_millerrabin (p, 25)) + if (mpz_is_prime_lookup(p)) goto done; next:; incr += 2; diff -r bcca14c8a090 tests/mpz/t-nextprime.c --- a/tests/mpz/t-nextprime.c Thu Mar 05 00:43:26 2020 +0100 +++ b/tests/mpz/t-nextprime.c Fri Mar 06 15:52:15 2020 -0800 @@ -20,6 +20,7 @@ #include <stdio.h> #include <stdlib.h> +#include <sys/time.h> #include "gmp-impl.h" #include "tests.h" @@ -97,8 +98,10 @@ for (i = 0; i < reps; i++) { + gmp_printf("{\"%Zd\": ", x); mpz_nextprime (y, x); mpz_sub (x, y, x); + gmp_printf("%d},\n", mpz_get_ui(x)); if (diffs != NULL && (! mpz_fits_sshort_p (x) || diffs[i] != (short) mpz_get_ui (x))) { @@ -170,24 +173,59 @@ rands = RANDS; TESTS_REPS (reps, argv, argc); + + struct timeval tval_before, tval_after, tval_result; - test_ref(rands, reps); + mpz_t n, m, d; + mpz_init(n); + mpz_init(m); + mpz_init(d); - run ("2", 1000, "0x1ef7", diff1); + int sizes[] = {100, 200, 300, 500, 1000, 2000, 5000}; + for (int i = 0; i < sizeof(sizes)/sizeof(sizes[0]); i++) { + int size = sizes[i]; + + mpz_ui_pow_ui(n, 2, size); + mpz_set(m, n); - run ("3", 1000 - 1, "0x1ef7", NULL); + gettimeofday(&tval_before, NULL); + + for (int t = 0; t < 20; t++) { + mpz_nextprime(n, n); + mpz_sub(d, n, m); + // gmp_printf("%d,\n", mpz_fdiv_ui(n, 1000000007)); // Poor man's hash + } - run ("0x8a43866f5776ccd5b02186e90d28946aeb0ed914", 50, - "0x8a43866f5776ccd5b02186e90d28946aeb0eeec5", diff3); + gettimeofday(&tval_after, NULL); + timersub(&tval_after, &tval_before, &tval_result); + float seconds = tval_result.tv_sec; + seconds += tval_result.tv_usec / 1e6; - run ("0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF6C", 50, /* 2^148 - 148 */ - "0x100000000000000000000000000000000010ab", diff4); + mpz_sub(m, n, m); + gmp_printf("%d, gap: %Zd, %.5f seconds\n", size, m, seconds); + } + + mpz_clear(n); + mpz_clear(m); + mpz_clear(d); + +// test_ref(rands, reps); - run ("0x1c2c26be55317530311facb648ea06b359b969715db83292ab8cf898d8b1b", 50, - "0x1c2c26be55317530311facb648ea06b359b969715db83292ab8cf898da957", diff5); - - run ("0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF80", 50, /* 2^128 - 128 */ - "0x10000000000000000000000000000155B", diff6); +// run ("2", 1000, "0x1ef7", diff1); +// +// run ("3", 1000 - 1, "0x1ef7", NULL); +// +// run ("0x8a43866f5776ccd5b02186e90d28946aeb0ed914", 50, +// "0x8a43866f5776ccd5b02186e90d28946aeb0eeec5", diff3); +// +// run ("0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF6C", 50, /* 2^148 - 148 */ +// "0x100000000000000000000000000000000010ab", diff4); +// +// run ("0x1c2c26be55317530311facb648ea06b359b969715db83292ab8cf898d8b1b", 50, +// "0x1c2c26be55317530311facb648ea06b359b969715db83292ab8cf898da957", diff5); +// +// run ("0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF80", 50, /* 2^128 - 128 */ +// "0x10000000000000000000000000000155B", diff6); // Too slow to include in normal testing. //test_largegaps ();
_______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel