Re: question on fixing the mpz_sizeinbase()
Ciao, I answer here to a question from gmp-discuss. 17 mar 2024, 01:28 da t...@gmplib.org: > "website.reader" writes: > > Here's the suggested fix for this command in a C++ code unit > This question arises once every couple of years, more or less... Should we write a new mpz_sizeinbase_exactbutpossiblymuchslower() function? Ideas: - use the vlog[] table we already have in mpn/generic/rootrem.c to compute some "decimal digits" of the logarithm base 2 of the number to more exactly discriminate within the "digits" and the "digits+1" case based on the higher bits of the number. This keeps on being O(1) but should narrow the range of cases where we need to ( - compute a sort of ui_powhi_ui (we will have a use for mulhi :-), or finally) - compute ui_pow_ui and fully compare, giving the exact result. Do you feel it may be useful? It would be more clever than what the user can simply do themself. Ĝis, mb ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Has there been historical work done to investigate small integer optimization?
Ciao, 12 feb 2024, 11:54 da paul.zimmerm...@inria.fr: > as Niels said, I fear the cost of the tests might make the benefit vanish. > But to be sure, the only way is to try to implement this idea inside GMP, > which involves quite some work. > But implementing it with the current mpz type is "impossible". I mean, one should break the current interface. Currently, _mp_d is always a pointer AND it always point to a readable limb. Even if _mp_alloc is zero. To implement the idea inside GMP one should choose: - break current interface, or - fully implement a new layer (not mpn, mpz, mpf, but... mpZ?). Ĝis, mb ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mpn_mulhigh_basecase for Broadwell
Thank you Albin, currently we don't have any mulhi function, so we didn't decide a possible interface for it. There are some comments in the code where a full mul is used, but some "mulhi" function could be enough. Many of them seem to use unbalanced operands, and might require the exact result or more limbs than the higher half... By the way, I like the idea of computing one more limb, that way one may know whether the limbs obtained by mulhi are exact or not. Ĝis, mb 6 feb 2024, 23:27 da albin.ahlb...@gmail.com: > Hello, > > I just wrote an implementation for mpn_mulhigh_basecase for Broadwell-type > processors (that is, x86_64 with BMI2 and ADX instructions) based on > Torbjörn's `mpn_mullo_basecase'. > > It is currently declared on the form > > mp_limb_t flint_mpn_mulhigh_basecase(mp_ptr rp, mp_srcptr xp, mp_srcptr yp, > mp_size_t n), > > as it was developed for FLINT (Fast Library for Number Theory). Note that > `rp' cannot be aliased with `xp' or `yp'. > > It will compute an approximation of the upper most `n + 1' limbs of `{xp, n}' > times `{yp, n}', where the upper most `n' limbs are pushed to `rp[0]', ..., > `rp[n - 1]' and the least significant computed limb is returned (via %rax). > This returned limb should have an error of something along `n' ULPs. > > Note that this differs from MPFR's (private) function `mpfr_mulhigh_n', which > computes the approximation of the upper most `n' limbs into `rp[n]', ..., > `rp[2 * n - 1]', where `rp[n]' has an error of something along `n' ULPs at > most. > > Feel free to change it according to your needs (perhaps you do not want to > compute `n + 1' limbs, but rather `n' limbs). > > If this code will be used in GMP, feel free to remove the copyright claim for > FLINT and put my name (spelled Albin Ahlbäck) in the GMP copyright claim > instead. > > Just some notes: > > - We use our own M4 syntax for the beginning and ending of the function, but > it should be easy to translate to GMP's syntax. > - It currently only works for n > 5 (I believe) as we in FLINT have > specialized routines for small n. > - It would be nice to avoid pushing five register, and only push four. > - Reduce the size of the `L(end)' code, and try to avoid multiple jumps > therein. > - Move the code-snippet of `L(f2)' to just above `L(b2)', so that no jump is > needed in between. (This currently does not work because `L(end)' as well as > this code-snippet is too large for a relative 8-bit jump.) > - Start out with an mul_1 sequence with just a mulx+add+adc chain, just like > in `mpn_mullo_basecase'. > - Remove the first multiplication in each `L(fX)' and put it in `L(end)' > instead. > - The `adcx' instruction in `L(fX)' can be removed (then one needs to adjust > the `L(bX)'-label), but I found it to be slower. Can we remove it and somehow > maintain the same performance? > > Best, > Albin > ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Tuneup using size_ratio
Ciao, 23 dic 2023, 21:02 da ni...@lysator.liu.se: > marco.bodr...@tutanota.com writes: > >> But if I measure the speed with the ratio 0.7 or 0.8 I get quite >> different thresholds, which one should we use? >> > Maybe crossover is rather flat, i.e., a wide range with very small > difference in performance? > Actually, looking at the colored bi-dimensional maps we still have under the title "New Toom multiplication code for unbalanced operands" on page https://gmplib.org/devel/ , I believe that the line representing the thresholds are more complex than a straight un=constant or vn=constant. (un+vn is more appropriated, when dealing with FFT, but non necessarily good on other border lines). I would like to reproduce those maps with the current toom6h and toom8h, to see what changed. > I hope these changes should make it easier to > investigate such things. > I believe that those changes are interesting! The threshold system is complex for the unbalanced operations, anyway. Those changes suggest us to mark more clearly on the names of the thresholds which line was used to measure it. I mean, if it's probably quite obvious that the threshold toom42-toom63 is measured on a 2x1 ratio line, it may not be as obvious for other thresholds. Moreover, on the 2x1 line, we have the MUL_TOOM42_TO_TOOM63_THRESHOLD, but we are not measuring the TOOM63_TO_TOOM6h and TOOM6h_TO_TOOM8h on that line. Should we use those functions? > I intend to push this change in a few days. Have a nice Christmas! > May it be useful to add also the "/r" option for mpn_nussbaumer_mul? And now, have a happy new year! Ĝis! mb ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: [PATCH] Fix typo in z13 bdiv_dbm1c.
Ciao, 10 ott 2023, 09:14 da t...@gmplib.org: > Stefan Liebler writes: > > In my case, before calling mpn_divexact_by3c v6 was {0x1, 0x0} and thus > > OK. We should clearly have tests/*call.asm and tests/*check.c for > s390_64 and s390_32. (But the check file might want to re-randomise the > magic register values after checking them.) > Another approach, simpler in my opinion, is to write a test checking the return value of the function mpn_divexact_by3c. The manual documents that the "return value c satisfy c*b^n + a-i = 3*q, where...", exactly, so we should test the return value. In the library we have some ASSERTs, checking if the return value is zero or not, but I believe we do not have any place where the correct return value is really checked. Does the attached test detect the suspicious code? Ĝis, m /* Test for mulmod_bknp1 function. Contributed to the GNU project by Marco Bodrato. Copyright 2023 Free Software Foundation, Inc. This file is part of the GNU MP Library test suite. The GNU MP Library test suite is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. The GNU MP Library test suite is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with the GNU MP Library test suite. If not, see https://www.gnu.org/licenses/. */ #include #include #include "gmp-impl.h" #include "tests.h" /* Sizes are up to 2^SIZE_LOG limbs */ #ifndef SIZE_LOG #define SIZE_LOG 7 #endif #ifndef COUNT #define COUNT 5000 #endif #define MAX_N (1 << SIZE_LOG) #define MIN_N 1 int main (int argc, char **argv) { mp_ptr np, n3p, rp; int count = COUNT; int test; gmp_randstate_ptr rands; TMP_DECL; TMP_MARK; TESTS_REPS (count, argv, argc); tests_start (); rands = RANDS; np = TMP_ALLOC_LIMBS (MAX_N); n3p = TMP_ALLOC_LIMBS (MAX_N + 1); rp = 1 + TMP_ALLOC_LIMBS (MAX_N + 1); for (test = 0; test < count; test++) { unsigned size_min; unsigned size_range; mp_size_t n; mp_limb_t r_before, r_after; mp_limb_t res, expected; for (size_min = 1; (1L << size_min) < MIN_N; size_min++) ; /* We generate n in the MIN_N <= n <= (1 << size_range). */ size_range = size_min + gmp_urandomm_ui (rands, SIZE_LOG + 1 - size_min); n = MIN_N + gmp_urandomm_ui (rands, (1L << size_range) + 1 - MIN_N); mpn_random (np, n); mpn_random2 (rp - 1, n + 2); r_before = rp[-1]; r_after = rp[n + 1]; expected = mpn_mul_1 (n3p, np, n, 3); if ((n & 1) == 0) { res = mpn_divexact_by3 (rp, n3p, n / 2); res = mpn_divexact_by3c (rp + n / 2, n3p + n / 2, n / 2, res); } else res = mpn_divexact_by3c (rp, n3p, n, 0); if (rp[-1] != r_before || rp[n + 1] != r_after || mpn_cmp (rp, np, n) != 0 || res != expected) { printf ("ERROR in test %d, n = %d\n", test, (int) n); if (rp[-1] != r_before) { printf ("before rp:"); mpn_dump (rp - 1, 1); printf ("keep: "); mpn_dump (_before, 1); } if (rp[n + 1] != r_after) { printf ("after rp:"); mpn_dump (rp + n + 1, 1); printf ("keep: "); mpn_dump (_after, 1); } if (res != expected) { printf ("retval :"); mpn_dump (, 1); printf ("expected:"); mpn_dump (, 1); } mpn_dump (np, n); mpn_dump (n3p, n); mpn_dump (rp, n); abort(); } ASSERT_ALWAYS (res < 3); } TMP_FREE; tests_end (); return 0; } ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
gcc -fanalyzer
Ciao, gcc complains about missing call to 'va_end' in scanf/doscan.c and printf/doprnt.c . man stdarg on my Debian reads: "Each invocation of va_copy() must be matched by a corresponding invocation of va_end() in the same function." So that probably gcc is right. I assume we never noticed, because probably va_end is a noop, But, we may want to correct it anyway. Correcting scanf/doscan.c is quite easy, I'd suggest: ~/src/gmp$ hg diff scanf/doscan.c diff -r 8225bdfc499f scanf/doscan.c --- a/scanf/doscan.c Tue Sep 05 18:32:26 2023 +0200 +++ b/scanf/doscan.c Sat Sep 16 18:40:28 2023 +0200 @@ -761,6 +761,7 @@ } done: + va_end (ap); (*__gmp_free_func) (alloc_fmt, alloc_fmt_size); return fields; } ~/src/gmp$ On the printf/doprnt.c side, I'm not sure. There is a "va_copy (ap" at the beginning, so we can va_end before returning. Each loop starts with a "va_copy (this_ap" and can end with a corresponding va_end. But last_ap is va_copy-ed again and again; should we insert a va_end before each new copy? I attach a possible patch for printf/doprnt.c . Ĝis, m diff -r 8225bdfc499f printf/doprnt.c --- a/printf/doprnt.c Tue Sep 05 18:32:26 2023 +0200 +++ b/printf/doprnt.c Sat Sep 16 18:59:51 2023 +0200 @@ -340,6 +340,7 @@ ret = __gmp_doprnt_integer (funs, data, , gmp_str); __GMP_FREE_FUNC_TYPE (gmp_str, strlen(gmp_str)+1, char); DOPRNT_ACCUMULATE (ret); + va_end (last_ap); va_copy (last_ap, ap); last_fmt = fmt; } @@ -372,6 +373,7 @@ DOPRNT_ACCUMULATE (__gmp_doprnt_mpf (funs, data, , GMP_DECIMAL_POINT, va_arg (ap, mpf_srcptr))); + va_end (last_ap); va_copy (last_ap, ap); last_fmt = fmt; break; @@ -494,6 +496,7 @@ case 'Z': mpz_set_si ((mpz_ptr) p, (long) retval); break; } } + va_end (last_ap); va_copy (last_ap, ap); last_fmt = fmt; goto next; @@ -604,7 +607,7 @@ next: /* Stop parsing the current "%" format, look for a new one. */ - ; + va_end (this_ap); } TRACE (printf ("remainder: \"%s\"\n", last_fmt)); @@ -616,6 +619,8 @@ goto error; done: + va_end (last_ap); + va_end (ap); __GMP_FREE_FUNC_TYPE (alloc_fmt, alloc_fmt_size, char); return retval; ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Fast constant-time gcd computation and modular inversion
Il 2022-09-01 17:04 Torbjörn Granlund ha scritto: /* FIXME: Using mpz_invert is cheating. Instead, first compute m' = m^-1 (mod 2^k) via Newton/Hensel. We can then get the inverse via 2^{-k} (mod m) = (2^k - m') * m + 1)/2^k. */ mpz_invert (t, t, m); mpn_copyi (info->ip, mpz_limbs_read (t), mpz_size (t)); You might want to use mpn_binvert here. We should start writing mpn_sec_binvert :-) Or use the current mpn_sec_invert for the precomputation. Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: More accurate calculation of sieve array size
Ciao Adrien, Il 2022-08-11 17:43 Adrien Prost-Boucle ha scritto: For primorial, when thinking on the code I considered that the call to mpz_prodlimbs (x, factors, j) is the one responsible for (re)sizing x to whatever appropriate for the result. Indeed, prodlimbs can resize the mpz_t to the needed size. (and I think I got confused with the reuse of name x in nested loop) This was a (stylistic) error of mine. https://gmplib.org/repo/gmp/rev/d337756619a0 If sizing x correctly before the call mpz_prodlimbs() garantees that mpz_prodlimbs() will not resize it, then the current implementation is efficient indeed. That's the aim of the code, avoid resizing. I had also another goal: in the future maybe move the computations to the mpn level... Does the code success in avoiding resize? But maybe it always allocate a too large area (that's a waste too)? The code reads: size = n / GMP_NUMB_BITS; size = size + (size >> 1) + 1; This means that we reserve an area for a result ≤ 2^(n*3/2) ≤ (2.829)^n, right? Wikipedia [ https://en.wikipedia.org/wiki/Primorial ] says: Using more advanced methods, Rosser and Schoenfeld showed that n# ≤ (2.763)^n and in Theorem 4, formula 3.14, showed that for n≥563, n# ≥ (2.22)^n Does our code make sense, is it tight enough? This could have deserved a bit of comment ;-) If we are sure that the trick works... yes, we should write something :-) Ĝis, m -- http://bodrato.it/papers/ ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: More accurate calculation of sieve array size
Ciao Adrien, Il 2022-08-10 15:37 Adrien Prost-Boucle ha scritto: Looking into oddfac and primorial computation code, I think there is a suboptimality in the computation of the prime sieve size. Could anyone confirm? Maybe there are some too-conservative assumptions, but we must be careful with the code. --- mpz/oddfac_1.old2022-08-10 15:14:04.511298108 +0200 +++ mpz/oddfac_1.c 2022-08-10 15:31:34.721354647 +0200 @@ -381,8 +381,8 @@ TMP_MARK; flag--; - size = n / GMP_NUMB_BITS + 4; - ASSERT (primesieve_size (n - 1) <= size - (size / 2 + 1)); + size = (n / 3 / GMP_NUMB_BITS + 1) * 2; + ASSERT (primesieve_size (n - 1) <= size / 2); /* 2-multiswing(n) < 2^(n-1)*sqrt(n/pi) < 2^(n+GMP_NUMB_BITS); one more can be overwritten by mul, another for the sieve */ MPZ_TMP_INIT (mswing, size); Here, mswing is not just used for the sieve, it will contain the result of the computation of the "2-multiswing.factorial of n". A portion of the same allocated space is used for the sieve (the ASSERT checks, is the size needed for the sieve less than half the size we allocate? Yes, ok we can go on...). The comment says that /* 2-multiswing(n) < 2^(n+GMP_NUMB_BITS) */, then it makes sense to allocate n / GMP_NUMB_BITS + 4 limbs. Surely this is an over-estimate, and maybe we can estimate it better. But to reduce the memory usage, we have to estimate the maximal size of the value that will be stored in mswing, not the sieve only. --- mpz/primorial_ui.c.old 2022-08-10 14:44:04.437867901 +0200 +++ mpz/primorial_ui.c 2022-08-10 14:55:41.721238758 +0200 @@ -82,8 +82,7 @@ mp_limb_t prod; TMP_DECL; - size = n / GMP_NUMB_BITS; - size = size + (size >> 1) + 1; + size = n / 3 / GMP_NUMB_BITS + 1; ASSERT (size >= primesieve_size (n)); sieve = MPZ_NEWALLOC (x, size); size = (gmp_primesieve (sieve, n) + 1) / log_n_max (n) + 1; The same here. sieve is stored at the same address as x, but we need to estimate the size of the value that x will store, not the sieve. x will store the result, primorial (n). The question is: is the allocated size reasonable to store the result x=primorial(n)? The other question, "Can it also contain the sieve?", is secondary. Ĝis, m -- http://bodrato.it/papers/ ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Use of AVX instructions in mpn_mul_1
Ciao Thanassis, Il 2022-06-13 23:17 Thanassis Tsiodras ha scritto: I had a quick look at the x86_64 assembly implementations of the basic primitive used in multiplications (mpn_mul_1), and saw this: ...I could not find any use of AVX-integer-related multiplication instructions. I am talking about things like " _mm512_mul_epu32", which at first glance seemed promising (8x32bit multiplications in one instruction generating 8x64-bit results in one go). Four 32x32->64 multiplications perform the same multiplication work of one 64x64->128. But are "8x32bit multiplications in one instruction" faster then two 64x64 mul? As you confirm, many other additions with carry propagation are needed. So the question is, does using AVX reduce the resources needed for a multiplication? I can't see a way to do that optimally. Is that the reason GMP asm code seems to prefer the simple 64x64 => 128 instructions? (mul %rcx) When you'll find an implementation with AVX, more efficient than our current implementation, you can contribute it to the project :-) Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Squaring optimization in mpz_addmul/mpz_submul
Ciao Fredrik, Il 2022-05-25 22:48 Fredrik Johansson ha scritto: Now that mpn_mul no longer dispatches to mpn_sqr, the squaring optimization is missing in mpz_addmul/mpz_submul. Thanks for spotting this! mpz_addmul(sum, a, a); This should be easy to fix. It is. I fixed it with a small patch. https://gmplib.org/repo/gmp/rev/61ef108d740c But I had to also add that case to the test program. The case mpz_addmul(sum, a, a); was correctly handled, but it wasn't tested. And the case mpz_addmul(x, x, x); keeps on being untested! Ĝis, m -- http://bodrato.it/papers/ ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: stronglucas.c
Ciao, Il 2022-05-18 19:32 Marco Bodrato ha scritto: Il Mer, 18 Maggio 2022 7:48 am, Niels Möller ha scritto: Seth Troisi writes: I was reading the stronglucas code Great! -if (((*PTR (n) & 6) == 0) && UNLIKELY (mpz_perfect_square_p (n))) Our test-suite did not trigger that branch. Now it does: https://gmplib.org/repo/gmp/rev/7ecb57e1bb82 I took the list of base-2 pseudo-primes by Jan Feitsma, published at http://www.cecm.sfu.ca/Pseudoprimes/index-2-to-64.html . Trowing the numbers in the list through the current implementation, it was easy to find examples triggering the different branches! Very useful! the whole "((*PTR (n) & 6) == 0) &&" code is an optimization, [...] Should we avoid to repeat the check here and call the function? Maybe we can directly call the function... On the other side, maybe we should avoid some calls to jacobi... The current search for an odd D such that the Jacobi symbol (n/|D|) is negative is performed by the following code: D = GMP_NUMB_BITS % 16 == 0 ? (GMP_NUMB_BITS % 32 == 0 ? 17 : 15) : 5; do { if (UNLIKELY (D >= maxD)) return 1; D += 2; jac = mpz_oddjacobi_ui (n, D); } while (jac == 1); If we already checked 15, we may skip all the composite D. Should we at least use a code that skips the multiple of 3? Something like: unsigned Ddiff = 2; #if GMP_NUMB_BITS % 16 == 0 const unsigned D2 = 6; #if GMP_NUMB_BITS % 32 == 0 D = 19; Ddiff = 4; #else D = 17; #endif #else const unsigned D2 = 4; D = 7; #endif for (;;) { jac = mpz_oddjacobi_ui (n, D); if (jac != 1) break; if (UNLIKELY (D >= maxD)) return 1; D += Ddiff; Ddiff = D2 - Ddiff; } In the common case (GMP_NUMB_BITS % 32 == 0), I'd expect that about 50% of the numbers entering stronglucas will use D=5 (Fibonacci), 25% D=7, 1/8 D=11, 1/16 D=13, 1/32 D=15, 1/64 D=17, so that only 1/64 should use this code. I'd expect that D=19 will be used for one half of them, but... should we check D=21 for the other half? Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: stronglucas.c
Ciao, Il Mer, 18 Maggio 2022 7:48 am, Niels Möller ha scritto: > Seth Troisi writes: >> I was reading the stronglucas code Great! >> diff -r 970b7221873f mpz/stronglucas.c >> /* n is odd, to possibly be a square, n % 8 = 1 is needed. */ >> -if (((*PTR (n) & 6) == 0) && UNLIKELY (mpz_perfect_square_p (n))) >> +if (((*PTR (n) & 7) == 1) && UNLIKELY (mpz_perfect_square_p (n))) >>return 0; /* A square is composite. */ >> Technically n/x should be odd per the function comment but IMO this >> improves readability by having the "n % 8 = 1" in the nearer comment >> match the code. > I think the way it's currently written is a micro optimization, testing > That said, I don't think this code is that performance critical, so > saving a single instruction doesn't matter much. I agree your change > improves readability a bit. Shall we remove a (micro) optimization, or shall we improve the comment? Anyway, the whole "((*PTR (n) & 6) == 0) &&" code is only an optimization, the first check done by the function mpz_perfect_square_p is based on the lowest bits. Should we avoid to repeat the check here and simply call the function? >> It's also possible lines 122-124 should also be indented > > Indentation looks right to me, at the current place. But perhaps more > consistent with nearby code if comments are moved below the "else if" > line (and then indented accordingly for that location). line 122 /* (2^24 - 1) | (2^{GMP_NUMB_BITS*3/4} - 1) */ is indented as line 97; /* (2^12 - 1) | (2^{GMP_NUMB_BITS*3/4} - 1) */ 123 as 99. Those comments should explain what happens with the given GMP_NUMB_BITS values. But I agree, there should also be a comment after the "else if", explaining how the check for D=17 works. There are also other comments that should be improved. The comment to the function (line 80) reads: /* Requires GCD (x,6) = 1.*/ Inside the code included by #if GMP_NUMB_BITS % 16 == 0 we have (line 100) ASSERT (g % 3 != 0 && g % 5 != 0 && g % 7 != 0); but I fear that the code actually assumes also g % 11 != 0 && g % 13 != 0 && g % 17 != 0 Not a problem, with the current use in the library (this code is called after some trial division check), but it is more difficult to read the code if the comments are not correct. Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: about prime number generation
Il 2022-05-01 17:01 Marco Bodrato ha scritto: Il 2022-04-30 23:39 Seth Troisi ha scritto: next_prime took up all of my energy We should further improve that work! With your improvements to nextprime, it's really a waste of resources if ones have to write for (int i = 0; i < delta2; i++) mpz_nextprime(p, p); because every sieved range is used just once, instead of scanning it to the end. I mean, we should add a couple of (public or not?) functions: mpz_nth_{prec,next}prime I wrote a simple mpz_nth_nextprime, and here are some timings: $ build/tune/speed -p1 -rs 10-60 -t5 mpz_nextprime mpz_nth_nextprime.2 mpz_nth_nextprime.4 mpz_nth_nextprime.16 mpz_nth_nextprime.256 overhead 0.7 secs, precision 1 units of 8.56e-10 secs, CPU freq 1167.94 MHz mpz_nextprime nth_nxtprim.2 nth_nxtprim.4 nth_nxtprm.16 nth_nxtp.256 10 #0.011412.43846.1353 42.1803 4318.7026 15 #0.025872.07954.4340 23.0275 1946.8527 20 #0.254721.82043.4797 13.7392237.0212 25 #0.326771.88303.6122 14.1018225.8131 30 #0.406901.89053.6741 14.3550230.1069 35 #0.535111.82853.5060 13.4946215.2175 40 #0.606371.84213.5508 13.6446217.1993 45 #0.700471.82803.4869 13.5221217.8561 50 #0.781571.81003.5152 13.6302216.4944 55 #0.946821.90493.6773 14.4213231.4945 60 #0.0001060161.86423.5877 14.0455231.0008 This means that, the 256th next prime of a large enough number (more than 20 bits) is found faster than using _nextprime 256 times. For smaller numbers this is not evident, for two reasons: current code use different strategies for numbers smaller than 20 bits, and the 256th next prime of a 10bit number is a 12bit prime, i.e. a much larger number. But I realize that this is not a clever approach. My code actually finds all the primes and skips them, but this is not very useful. The need to have a way to loop on primes emerges again. Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Use on small mulmod_bnm1 [was: New mulmod_bknp1]
Ciao Niels, for some reason I did not receive your answer. I took the text from the archive. Il gio, 21 aprile 2022 08:15 am, Niels Möller ha scritto: > Marco Bodrato writes: >> Yes, with a larger expected gain for 3, and a smaller one for 13, or 17. > > And in your table, almost all use 3, and none use 7, 13 or 17. >> size -> measured time >> 1080 -> 6.906e-05 (+ 72, +12%) 2^3*3^3*5 >> 1104 -> 7.294e-05 (+ 24, +5.6%) 2^4*3*23 > > So 1104 wins over the power of 2, 1024 = 2^10. Yes, it does. Both on shell and on my laptop, even if in both cases MUL_FFT_MODF_THRESHOLD is smaller than 512. >> 1584 -> 0.0001159 (+ 72, +4.2%) 2^4*3^2*11 >> 1600 -> 0.0001217 (+ 16, +5.1%) 2^6*5^2 > > The only example in the list using 5 as a factor. There are some more in the full list... >> 1984 -> 0.0001648 (+ 40, +3.1%) 2^6*31 > > First example not using any of the odd numbers in the list, so not using > mulmod_bknp1 at all. Yes, starting from here, the power of 2 seems dominant. >> 2048 -> 0.0001676 (+ 64, +1.7%) 2^11 > > I wonder if this would be beaten by 2064 = 2^4*3*43, or if this power > of two really is a winner. It is a winner, I measured a much larger range. > It's also interesting that all these winners use 2^k with k >= 3, so > third split in mulmod_bnm1 seems to pay off measurably. On that machine, in that range. > So just rounding up to a multiple of 24 = 2^3 * 3 might be a reasonable > initial strategy (above some threshold of a likely few hundred limbs). The reasonable gaps between the sizes probably are: +2, +4, +8, +12, +24, +48, then powers of two. On shell, something like: +2 up to 36 +4 up to 108 +12 up to 264 +24 up to 936 and up to 1944... +24 again? Even if we have many +72 (preserving a factor 3^2)... Then, powers of 2. Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: about prime number generation
Ciao Seth, Il 2022-04-30 23:39 Seth Troisi ha scritto: I worked on nth_prime(n) in a series of patches three years ago That code was very interesting. It would be nice to get rid of the double type and the function log(). The library is not using math.h and libm anywhere now. But it doesn't seem simple, at least for me. We should start pushing a primepi(N) function, counting the primes between 1 and N. I have a function, which is almost ready, with the following prototype: long gmp_pi_ui (unsigned long n) But to have a function that really belongs to GMP, we should extend it to at least mpz_pi (mpz_t n) ... With a reasonable return type ... Sadly this never made it upstream as the review process for a faster next_prime took up all of my energy We should further improve that work! With your improvements to nextprime, it's really a waste of resources if ones have to write for (int i = 0; i < delta2; i++) mpz_nextprime(p, p); because every sieved range is used just once, instead of scanning it to the end. I mean, we should add a couple of (public or not?) functions: mpz_nth_{prec,next}prime In practice I'd recommend you also take a look at the highly optimized code from kimwalisch https://github.com/kimwalisch/primecount Of course a specialized library will remain faster than some small piece of code in GMP... Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Use on small mulmod_bnm1 [was: New mulmod_bknp1]
Ciao, Il 2022-04-19 21:03 ni...@lysator.liu.se ha scritto: Marco Bodrato writes: In the 128-2048 range (at least on that machine: shell.gmplib.org) the sizes multiple of 12, 24, 48... should be preferred... I don't fully understand this, but if I got it right, we want the size n to be divisible by 2^k, for mulmod_bnm1 to be able to split a few times. But we don't need a very large k, since we have diminishing returns for each split. And for the new mulmod_bknp1 to fit, we also need n to be divisible by one of certain small odd numbers, currently 3, 5, 7, 13, 17. Yes, with a larger expected gain for 3, and a smaller one for 13, or 17. I think it would make sense to first select k (maybe a constant, or growing very slowly with n, which might ask for a tuned table), say 2 <= k <= 5 for the size range of interest. And then round n upwards to the closest multiple of one of 2^k * {3, 5, 7, 13, 17}. Hmm, or maybe to make it more complex, one of 2^{k,k-1} * {3, 5, 7, 13, 17}, it that let's us avoid growth. It would be nice if we could find a set of candidates that guarantees that we don't have to increase size more than, say, 10%, but not sure if that's possible. It should be possible to not increase too much. The following is a list of the best sizes wrt time spent in mulmod_bnm1. Extracted in the range 1024..2048. I'm not sure the program I wrote really shows what is needed. Nevertheless, if the list contains 1008 and 1080, this means that for every size in the range (1080..1080) the time for a multiplication using that size is larger than the time spent by the last point in the interval. size -> measured time 1008 -> 6.156e-05 (+ 72, +8.3%) 2^4*3^2*7 1080 -> 6.906e-05 (+ 72, +12%) 2^3*3^3*5 1104 -> 7.294e-05 (+ 24, +5.6%) 2^4*3*23 1128 -> 7.686e-05 (+ 24, +5.4%) 2^3*3*47 1200 -> 7.986e-05 (+ 72, +3.9%) 2^4*3*5^2 1224 -> 8.28e-05(+ 24, +3.7%) 2^3*3^2*17 1296 -> 8.602e-05 (+ 72, +3.9%) 2^4*3^4 1320 -> 9.437e-05 (+ 24, +9.7%) 2^3*3*5*11 1368 -> 9.824e-05 (+ 48, +4.1%) 2^3*3^2*19 1392 -> 0.0001022 (+ 24, + 4%) 2^4*3*29 1416 -> 0.0001087 (+ 24, +6.4%) 2^3*3*59 1512 -> 0.0001112 (+ 96, +2.3%) 2^3*3^3*7 1584 -> 0.0001159 (+ 72, +4.2%) 2^4*3^2*11 1600 -> 0.0001217 (+ 16, +5.1%) 2^6*5^2 1680 -> 0.0001273 (+ 80, +4.6%) 2^4*3*5*7 1704 -> 0.0001396 (+ 24, +9.7%) 2^3*3*71 1728 -> 0.00014 (+ 24, +0.23%) 2^6*3^3 1776 -> 0.0001434 (+ 48, +2.4%) 2^4*3*37 1800 -> 0.0001439 (+ 24, +0.35%) 2^3*3^2*5^2 1872 -> 0.0001463 (+ 72, +1.7%) 2^4*3^2*13 1920 -> 0.000158(+ 48, + 8%) 2^7*3*5 1944 -> 0.0001598 (+ 24, +1.2%) 2^3*3^5 1984 -> 0.0001648 (+ 40, +3.1%) 2^6*31 2048 -> 0.0001676 (+ 64, +1.7%) 2^11 Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Use on large mul_fft [was: New mulmod_bknp1]
Il 2022-04-15 19:56 Marco Bodrato ha scritto: Ciao, Il 2022-02-15 11:48 Marco Bodrato ha scritto: I pushed some new functions to compute the product (and square) modulo B^nn+1, for small values of the exponent nn. Currently that code is used by two functions. One is mulmod_bnm1, The second one is mul_fft, thanks to the patch named "mpn/generic/mul_fft.c: Use _bknp1," [...], available at https://gmplib.org/repo/gmp/rev/f9cbcda05f7e I attach an image (fakte+eble_18327.png), showing the relative speed of the code after/before the patch. The green line represents the gain with the current code. The orange one is estimated, using not the actual time of the function for a given size, but the best timings among the usable sizes. Even if also for this range we see the effect "interesting improvement for some sizes, no news for some other sizes", the graph is not so "noisy". The two lines (green, and orange) show more or less the same shape. Working on a better criteria to choose the mulmod size for large sizes is probably not worth doing. Maybe something can be done on the FFT side? I didn't try. Ĝis, m___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mul_fft, cleaning up some details of the code
Ciao Paul, Il 2022-03-08 16:20 Paul Zimmermann ha scritto: Uhm, the last line of code just before that ones is: cc = mpn_sub_1 (r, r, m, cc) + 1; /* add 1 to cc instead of rd since rd might overflow */ it seems you are right! Well, I pushed a clean-up for that portion of the code too... https://gmplib.org/repo/gmp/rev/84893dca3e6c Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mul_fft, cleaning up some details of the code
Ciao Paul, Il 2022-03-08 12:56 Paul Zimmermann ha scritto: Since you deeply know how this code works, I ask you one more question. The last lines of the function mpn_fft_mul_2exp_modF (branch m < n) contain: /* now subtract cc and rd from r[m..n] */ r[n] = -mpn_sub_1 (r + m, r + m, n - m, cc); r[n] -= mpn_sub_1 (r + m, r + m, n - m, rd); if (r[n] & GMP_LIMB_HIGHBIT) r[n] = mpn_add_1 (r, r, n, CNST_LIMB(1)); This code assumes that there can only be a single borrow. Is it correct? yes there can only be a single borrow, since since cc is 0 or 1, Uhm, the last line of code just before that ones is: cc = mpn_sub_1 (r, r, m, cc) + 1; /* add 1 to cc instead of rd since rd might overflow */ So that I'd say that cc is 1 or 2. Then the case cc=2, m=n-1, r[m]=0, and rd=GMP_NUMB_MAX seems very very unlikely, but possible. if r[n] is non zero after r[n] = -mpn_sub_1 (r + m, r + m, n - m, cc), this can only happen for cc=1 and {r+m,n-m} = 000...000 before the mpn_sub_1 call, and {r+m,n-m} = 111...111 after, then the second mpn_sub_1 call cannot produce a borrow. Of course a second borrow may only happen if n-m=1, i.e. if {r+m,n-m} contains a single limb. Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mul_fft, cleaning up some details of the code
Ciao Paul, Il 2022-03-07 10:28 Paul Zimmermann ha scritto: Date: Sun, 06 Mar 2022 11:14:49 +0100 From: Marco Bodrato Specifically I'd focus into a suspicious piece of code, shared by both our current code and Vivien's. if (cy >= 0) cy = mpn_add_1 (tmp, tmp, Kl, cy); else cy = mpn_sub_1 (tmp, tmp, Kl, -cy); (resp. added) at tmp[0]. Here are some comments added to the code (from GMP 6.2.1) and a fix: if (cy >= 0) cy = mpn_add_1 (tmp, tmp, Kl, cy); +/* cy is now the carry at tmp[Kl] */ else +{ cy = mpn_sub_1 (tmp, tmp, Kl, -cy); + /* cy is now the borrow at tmp[Kl] */ + if (cy) +cy = mpn_add_1 (tmp, tmp, Kl, cy); + /* cy is now the carry at tmp[Kl] */ +} Your fix of course is correct, that's the full normalization for the else branch, but for that branch only. The patch I pushed yesterday introduce an initial tmp[Kl] = 0; and then uses INCR_U or DECR_U if (cy >= 0) MPN_INCR_U (tmp, Kl + 1, cy); else { tmp[Kl] = 1; MPN_DECR_U (tmp, Kl + 1, -cy - 1); } The resulting value in tmp may be almost any value with Kl limbs and an additional high 0 or 1. Since you deeply know how this code works, I ask you one more question. The last lines of the function mpn_fft_mul_2exp_modF (branch m < n) contain: /* now subtract cc and rd from r[m..n] */ r[n] = -mpn_sub_1 (r + m, r + m, n - m, cc); r[n] -= mpn_sub_1 (r + m, r + m, n - m, rd); if (r[n] & GMP_LIMB_HIGHBIT) r[n] = mpn_add_1 (r, r, n, CNST_LIMB(1)); This code assumes that there can only be a single borrow. Is it correct? I'm going to change them into the following: r[n] = 2; MPN_DECR_U (r + m, n - m + 1, cc); MPN_DECR_U (r + m, n - m + 1, rd); if (UNLIKELY ((r[n] -= 2) != 0)) { mp_limb_t cy = -r[n]; /* cy should always be 1, but we did not check if the case m=n-1, r[m]=0, cc+rd>GMP_NUMB_MAX+1, and then cy = 2, is actually possible. */ r[n] = 0; MPN_INCR_U (r, n + 1, cy); } If we really can assume that cc+rd <= GMP_NUMB_MAX+1, the code could be simpler: r[n] = 1; MPN_DECR_U (r + m, n - m + 1, cc); MPN_DECR_U (r + m, n - m + 1, rd); if (UNLIKELY (r[n] == 0)) MPN_INCR_U (r, n + 1, 1); else r[n] = 0; Ĝis, m -- http://bodrato.it/papers/ ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mul_fft, cleaning up some details of the code
Ciao, Il 2022-03-06 12:16 Torbjörn Granlund ha scritto: Marco Bodrato writes: The comment before the mpn_mul_fft_decompose function says "We must have nl <= 2*K*l", this means that we should never have ((dif = nl - Kl) > Kl), and the code in that branch should never be used. I noticed that at some point too, and have had a patch lying around for quite a while: You basically propose to remove the branch. I disabled it when we don't WANT_OLD_FFT_FULL, and corrected it anyway. (Not sure about the separate FIXME change therein; it looks strange to me now.) I'm not sure I understand it... Feel free to clean this up. Pushed a patch. Ĝis, m -- http://bodrato.it/papers/ ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
mul_fft, cleaning up some details of the code
Ciao, I looked into the code published by Samuel Vivien. I realised that in mul_fft there are a lot of _add_1 and _sub_1. At least some of them can be easily replaced by _INCR_U or _DECR_U... Specifically I'd focus into a suspicious piece of code, shared by both our current code and Vivien's. The function mpn_mul_fft_decompose has a branch "if (dif > Kl)", that ends with the following lines: if (cy >= 0) cy = mpn_add_1 (tmp, tmp, Kl, cy); else cy = mpn_sub_1 (tmp, tmp, Kl, -cy); Can those lines be correct? Can the value in cy be used after this code both if it contains the carry or if it contains the borrow of the operation on tmp? I believe this is an error, I'd change the last line into cy = 1 - mpn_sub_1 (tmp, tmp, Kl, -cy - 1); and I propose the attached patch changing this and some other _1 functions. This is not really needed to solve a bug. The comment before the mpn_mul_fft_decompose function says "We must have nl <= 2*K*l", this means that we should never have ((dif = nl - Kl) > Kl), and the code in that branch should never be used. According to the coverage analisys, the code is not explored by the tests: https://gmplib.org/devel/lcov/shell/var/tmp/lcov/gmp/mpn/mul_fft.c.gcov#691 The "bug" is never triggered. Maybe the branch could be used if someone re-enables mpn_mul_fft_full and uses it for very unbalanced (more than 4x1) operands? Ĝis, mdiff -r 6ff25532f2a4 mpn/generic/mul_fft.c --- a/mpn/generic/mul_fft.c Fri Mar 04 10:03:38 2022 +0100 +++ b/mpn/generic/mul_fft.c Sun Mar 06 09:41:58 2022 +0100 @@ -237,14 +237,14 @@ r[n] = 0; /* cc < 2^sh <= 2^(GMP_NUMB_BITS-1) thus no overflow here */ - cc++; - mpn_incr_u (r, cc); + ++cc; + MPN_INCR_U (r, n + 1, cc); - rd++; + ++rd; /* rd might overflow when sh=GMP_NUMB_BITS-1 */ - cc = (rd == 0) ? 1 : rd; + cc = rd + (rd == 0); r = r + m + (rd == 0); - mpn_incr_u (r, cc); + MPN_INCR_U (r, n + 1 - m - (rd == 0), cc); } else { @@ -284,7 +284,10 @@ r[n] = -mpn_sub_1 (r + m, r + m, n - m, cc); r[n] -= mpn_sub_1 (r + m, r + m, n - m, rd); if (r[n] & GMP_LIMB_HIGHBIT) - r[n] = mpn_add_1 (r, r, n, CNST_LIMB(1)); + { + r[n] = 0; + MPN_INCR_U (r, n + 1, CNST_LIMB(1)); + } } } @@ -560,7 +563,9 @@ */ tp[0] += cc; } - a[n] = mpn_sub_n (a, tp, tpn, n) && mpn_add_1 (a, a, n, CNST_LIMB(1)); + cc = mpn_sub_n (a, tp, tpn, n); + a[n] = 0; + MPN_INCR_U (a, n + 1, cc); } } TMP_FREE; @@ -657,8 +662,7 @@ } /* remains to subtract {ap+n, l} from {rp, n+1} */ - cc = mpn_sub_n (rp, rp, ap + n, l); - rpn -= mpn_sub_1 (rp + l, rp + l, n - l, cc); + rpn -= mpn_sub (rp, rp, n, ap + n, l); if (rpn < 0) /* necessarily rpn = -1 */ rpn = mpn_add_1 (rp, rp, n, CNST_LIMB(1)); return rpn; @@ -688,7 +692,10 @@ tmp = TMP_BALLOC_LIMBS(Kl + 1); - if (dif > Kl) + tmp[Kl] = 0; + /* The comment "We must have nl <= 2*K*l." says that + ((dif = nl - Kl) > Kl) should never happen. */ + if (UNLIKELY (dif > Kl)) { int subp = 0; @@ -713,16 +720,18 @@ else cy -= mpn_add (tmp, tmp, Kl, n, dif); if (cy >= 0) - cy = mpn_add_1 (tmp, tmp, Kl, cy); + MPN_INCR_U (tmp, Kl + 1, cy); else - cy = mpn_sub_1 (tmp, tmp, Kl, -cy); + { + tmp[Kl] = 1; + MPN_DECR_U (tmp, Kl + 1, -cy - 1); + } } else /* dif <= Kl, i.e. nl <= 2 * Kl */ { cy = mpn_sub (tmp, n, Kl, n + Kl, dif); - cy = mpn_add_1 (tmp, tmp, Kl, cy); + MPN_INCR_U (tmp, Kl + 1, cy); } - tmp[Kl] = cy; nl = Kl + 1; n = tmp; } @@ -755,7 +764,7 @@ static mp_limb_t mpn_mul_fft_internal (mp_ptr op, mp_size_t pl, int k, - mp_ptr *Ap, mp_ptr *Bp, mp_ptr A, mp_ptr B, + mp_ptr *Ap, mp_ptr *Bp, mp_ptr unusedA, mp_ptr B, mp_size_t nprime, mp_size_t l, mp_size_t Mp, int **fft_l, mp_ptr T, int sqr) { @@ -797,9 +806,7 @@ j = (K - i) & (K - 1); - if (mpn_add_n (n, n, Bp[j], nprime + 1)) - cc += mpn_add_1 (n + nprime + 1, n + nprime + 1, - pla - sh - nprime - 1, CNST_LIMB(1)); + cc += mpn_add (n, n, pla - sh, Bp[j], nprime + 1); T[2 * l] = i + 1; /* T = (i + 1)*2^(2*M) */ if (mpn_cmp (Bp[j], T, nprime + 1) > 0) { /* subtract 2^N'+1 */ @@ -825,8 +832,7 @@ } else { - cc = mpn_sub_1 (p + pla - pl, p + pla - pl, pl, cc); - ASSERT (cc == 0); + MPN_DECR_U (p + pla - pl, pl, cc); } } else @@ -918,18 +924,17 @@ A = TMP_BALLOC_LIMBS (K * (nprime + 1)); Ap = TMP_BALLOC_MP_PTRS (K); + Bp = TMP_BALLOC_MP_PTRS (K); mpn_mul_fft_decompose (A, Ap, K, nprime, n, nl, l, Mp, T); if (sqr) { mp_size_t pla; pla = l * (K - 1) + nprime + 1; /* number of required limbs for p */ B = TMP_BALLOC_LIMBS (pla); - Bp = TMP_BALLOC_MP_PTRS (K); } else {
Re: New mulmod_bknp1
Ciao, Il 2022-02-23 12:35 Marco Bodrato ha scritto: ... then maybe I was wrong when I wrote that we should not trade a factor 3 with a factor 2... Yes, I think I was wrong. I pushed to the repository the use of the _bknp1 (mod B^kn+1) code on the +1 side of the _bnm1 multiplication code (mod B^2n-1). And here are the measures by tune/speed on our development machine "shell". The _rounded measures use a larger size, rounded to some multiple of some power of two. Actually, they are not faster than the non-rounded size. Moreover, before the patch the _rounded cost was monotonous, now it's not. The rounding strategy needs a revision. [bodrato@shell ~/gmp-repo]$ /var/tmp/bodrato/gmp/hg/build/tune/speed -p1 -s 72-900 -rc -t36 mpn_mulmod_bnm1_rounded mpn_mulmod_bnm1 overhead 5.84 cycles, precision 1 units of 2.86e-10 secs, CPU freq 3500.09 MHz mpn_mulmod_bnm1_rounded mpn_mulmod_bnm1 724939.99 #0.9984 108 10049.80 #0.9249 144 12718.57 #0.9997 180 22397.62 #0.8529 216 23185.73 #0.9993 252 36533.91 #0.8669 288 34125.65 #0.9981 324 56029.22 #0.8225 360 48846.70 #0. 396 63004.78 #0.9731 432 61179.69 #0.9996 468 95840.50 #0.8255 504 #80506.971.0001 540 113637.14 #0.8538 576 #92016.261.0002 612 127849.99 #0.9169 648#115918.531. 684 161355.14 #0.8673 720 130868.85 #0.9980 756 166164.06 #0.9613 792 197147.66 #0.7897 828 196738.21 #0.9304 864 212050.34 #0.9630 900 217971.55 #0.9548 Ĝis, m -- http://bodrato.it/papers/ ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: binvert_limb speedup on 64 bit machines with UHWtype
Ciao, Il 2022-02-27 16:52 Marco Bodrato ha scritto: Il 2022-02-25 17:06 John Gatrell ha scritto: I tested using UHWtype in the macro for binvert_limb. On a 64 bit machine one of my programs gained a 3% speedup. On a 32 bit machine, there was no Should we use uint8_fast_t, uint16_fast_t, uint32_fast_t for the different levels, and let the compiler choose? :-D I tried code with uint_fast types, but it seems that the compiler is not choosing the faster type, the 64-bits type is always used :-( You should try to store also the 32-bits result into the half-type. I mean: try replacing the following two lines in your code __inv = 2 * __hinv - __hinv * __hinv * __n; /* 32 */ \ __inv = 2 * __inv - __inv * __inv * __n; /* 64 */ \ with __hinv = 2 * __hinv - __hinv * __hinv * __n; /* 32 */ \ __inv = 2 * (mp_limb_t)__hinv - (mp_limb_t)__hinv * __hinv * __n; /* 64 */ \ Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: binvert_limb speedup on 64 bit machines with UHWtype
Ciao, Il 2022-02-27 18:13 Marco Bodrato ha scritto: I did never look into that file :-) I inserted there a few more versions of binvert_limb. The attached patch is only for testing, not to be pushed (I used uint_fast##_t types). On shell I get the following: @shell ~/gmp-repo]$ /var/tmp/bodrato/gmp/hg/build/tune/speed -p1 -s 1 -rc binvert_limb binvert_limb_sec binvert_limb_pipe binvert_limb_uintfast overhead 5.84 cycles, precision 1 units of 2.86e-10 secs, CPU freq 3500.09 MHz binvert_limb binvert_limb_sec binvert_limb_pipe binvert_limb_uintfast 1 28.39 1.0485 #0.8930 0. The _sec version is 5% slower, and the _pipe one is 10% faster than the current. Ĝis, mdiff -r 1cafba189d5a tune/modlinv.c --- a/tune/modlinv.c Sun Feb 27 15:10:38 2022 +0100 +++ b/tune/modlinv.c Sun Feb 27 18:35:48 2022 +0100 @@ -1,7 +1,7 @@ /* Alternate implementations of binvert_limb to compare speeds. */ /* -Copyright 2000, 2002 Free Software Foundation, Inc. +Copyright 2000, 2002, 2022 Free Software Foundation, Inc. This file is part of the GNU MP Library. @@ -29,11 +29,133 @@ GNU Lesser General Public License along with the GNU MP Library. If not, see https://www.gnu.org/licenses/. */ -#include #include "gmp-impl.h" #include "longlong.h" #include "speed.h" +#define binvert_limb_uintfast(inv,n) \ + do { \ +mp_limb_t __n = (n); \ +mp_limb_t __inv; \ +uint_fast8_t __inv8; \ +ASSERT ((__n & 1) == 1); \ + \ +__inv8 = binvert_limb_table[(__n & 0xff)/2]; /* 8 */ \ +if (GMP_NUMB_BITS > 8) { \ + uint_fast16_t __inv16 = 2 * (uint_fast16_t)__inv8 - (uint_fast16_t)__inv8 * __inv8 * (uint_fast16_t)__n; \ +if (GMP_NUMB_BITS > 16) { \ + uint_fast32_t __inv32 = 2 * (uint_fast32_t)__inv16 - (uint_fast32_t)__inv16 * __inv16 * (uint_fast32_t)__n; \ +if (GMP_NUMB_BITS > 32) { \ +__inv = 2 * (mp_limb_t)__inv32 - (mp_limb_t)__inv32 * __inv32 * __n; \ +} else { __inv = __inv32; }; \ +} else { __inv = __inv16; }; \ +} else { __inv = __inv8; }; \ + \ +if (GMP_NUMB_BITS > 64) \ + { \ + int __invbits = 64; \ + do {\ + __inv = 2 * __inv - __inv * __inv * __n; \ + __invbits *= 2; \ + } while (__invbits < GMP_NUMB_BITS);\ + } \ + \ +ASSERT ((__inv * __n & GMP_NUMB_MASK) == 1); \ +(inv) = __inv & GMP_NUMB_MASK; \ + } while (0) + +/* An (imperfect) copy of the binvert function contained in + mpn/generic/sec_powm.c . + */ + +#if GMP_LIMB_BITS <= 32 +#define binvert_limb_sec(inv,n) \ + do {\ +mp_limb_t __n = (n); \ +mp_limb_t __inv, __t; \ +ASSERT ((__n & 1) == 1); \ +/* 3 + 2 -> 5 */ \ +__inv = __n + (((__n + 1) << 1) & 0x18); \ +\ +__t = __n * __inv; \ +/* 5 x 2 + 2 -> 12 */ \ +__inv = 2*__inv - __inv*__t + ((__inv<<10)&-(__t&(1<<5))); \ +\ +if (GMP_NUMB_BITS > 12) \ + {\ +__t = __n * __inv - 1; \ + /* 12 x 3 -> 36 */ \ + __inv += __inv * __t * (__t - 1); \ + }\ +\ +ASSERT ((__inv * __n & GMP_NUMB_MASK) == 1); \ +(inv) = __inv & GMP_NUMB_MASK;\ + } while (0) +#else /* GMP_LIMB_BITS > 32, but actually > 48 is required */ +#define binvert_limb_sec(inv,n) \ + do {\ +mp_limb_t __n = (n); \ +mp_limb_t __inv, __t; \ +ASSERT ((__n & 1) == 1); \ +/* 3 + 2 -> 5 */ \ +__inv = __n + (((__n + 1) << 1) & 0x18); \ +\ +__t = __n * __inv; \ +/* 5 x 2 + 2 -> 12 */ \ +__inv = 2*__inv - __inv*__t + ((__inv<<10)&-(__t&(1<<5))); \ +\ +__t = __n * __inv - 1; \ +mp_limb_t __t2 = __t * __t; \ +/* 12 x 5 + 4 -> 64 */ \ +__inv *= (__t2+1)*(__t2-__t)+1 -((__t<<48)&-(__t&(1<<12))); \ +\ +/* 64 -> 128 -> 256 -> ... */\ +for (int __b = (GMP_NUMB_BITS - 1) >> 6; __b != 0; __b >>= 1) \ + __inv = 2 * __inv - __inv * __inv * __n; \ +\ +ASSERT ((__inv * __n & GMP_NUMB_MASK) == 1); \ +(inv) = __inv & GMP_NUMB_MASK;\ + } while (0) +#endif + +/* Like the standard version in gmp-impl.h, but with a different path + for bit sizes larger than 32, with concurrent multiplications. */ + +#define binvert_limb_pipe(inv,n) \ + do { \ +mp_limb_t __n = (n); \ +mp_limb_t __inv; \ +ASSERT ((__n & 1) == 1); \ + \ +__inv = binvert_limb_table[(__n & 0xFF) / 2]; /* 8 */ \ +if (GMP_NUMB_BITS <= 32) \ + { \ + if (GMP_NUMB_BITS > 8) \ + __inv = 2
Re: binvert_limb speedup on 64 bit machines with UHWtype
Ciao Torbjörn, Il 2022-02-27 10:53 Torbjörn Granlund ha scritto: Perhaps one could write it (n & 0xff)/2 and get better code Actually, this trick is already used! In tune/modlinv.c :-/ The following line dates back to 2002: __inv = binvert_limb_table[(__n&0xFF)/2]; /* 8 */ \ I did never look into that file :-) Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: binvert_limb speedup on 64 bit machines with UHWtype
Ciao John Gartell, Il 2022-02-25 17:06 John Gatrell ha scritto: Hi everyone. I'm new here so I don't know how to submit a new gmp-impl.h This list is the correct place for such a discussion. I tested using UHWtype in the macro for binvert_limb. On a 64 bit machine one of my programs gained a 3% speedup. On a 32 bit machine, there was no Unfortunately, on different platforms, we have different speeds. Should we use uint8_fast_t, uint16_fast_t, uint32_fast_t for the different levels, and let the compiler choose? :-D Anyway, if you are curious, you can also try what happens with a table-lookup-free binvert_limb function. You can find it, named sec_binvert_limb, in the file mpn/generic/sec_powm.c . There are also architectures where smaller operations are NOT faster, but, on the other side, more than one mul instruction can run concurrently. For those architectures I'd suggest the following, any couple of multiplications beyond the 32-bits limit can be computed in parallel. #define binvert_limb(inv,n) \ do { \ mp_limb_t __n = (n); \ mp_limb_t __inv; \ ASSERT ((__n & 1) == 1); \ \ __inv = binvert_limb_table[(__n/2) & 0x7F]; /* 8 */ \ if (GMP_NUMB_BITS <= 32) \ { \ if (GMP_NUMB_BITS > 8) \ __inv = 2 * __inv - __inv * __inv * __n; \ if (GMP_NUMB_BITS > 16) \ __inv = 2 * __inv - __inv * __inv * __n; \ } \ else \ { \ mp_limb_t __t, __t2, __p; \ int __invbits = 32; \ __t = __n * __inv - 1; /* x */ \ __t2= __t * __t;/* x^2 */ \ __p = __inv * (__t - __t2); /* (1-x^2)x/(x+1) */ \ \ while (__invbits < GMP_NUMB_BITS - 8) \ { \ __p *= __t2 + 1; /* (1-x^{2^n})x/(x+1) */ \ __t2 *= __t2; /* x^{2^n} */ \ __invbits <<= 1; \ } \ /* __inv * (x^{2^{n+2}+1}-1)/(x+1) */ \ __inv -= __p * (__t2 + 1); \ } \ \ ASSERT ((__inv * __n & GMP_NUMB_MASK) == 1); \ (inv) = __inv & GMP_NUMB_MASK; \ } while (0) In any case, for 64-bits machines, all the described functions use 6 multiplications :-) but with very different paths toward the desired result! :-D Maybe you can try them on your platform... and tell us exactly which CPU are you using for your tests, and how do they behave. Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: New mulmod_bknp1
Ciao Niels and David, Il 2022-02-23 09:14 ni...@lysator.liu.se ha scritto: N^3-1 = (N^2+N+1)/3 * 3^{k+1} * (N-1)/3^k Could be implemented as one multiply each mod 2^N+N+1 and (N-1), followed by reduction mod (N^2+N+1)/3 and (N-1)/3^k at the end; these reductions correspond to small quotients and should be cheap (I suspect we have to require canonical reduction for CRT reconstruction to Yes, there are many details that one needs to take into account to implement something like that... Il 2022-02-22 22:55 David Harvey ha scritto: What about B^33 - 1 = (B^11 - 1)(B^22 + B^11 + 1)? Also, I suspect that some of these tricks are worth doing even if the factorisations cross the B boundaries. The extra shifting overhead is only linear. I do not, but if someone has the time to try to implement some extensions on the _bnm1 side (B^n-1), I'd suggest to try at first the "cross the B boundaries" strategy. I mean, if we have H s.t. H^2 = B, I'd try to split B^33 - 1 = (H^33 - 1)(H^33 + 1) One more observation. Currently, the library may require a product eg. mod B^72-1. The current _bnm1 code splits the computation into B^72-1 = (B^36+1)(B^18+1)(B^9+1)(B^9-1) I'd expect that more than half the time is spent computing mod B^36+1. Less than 1/4 the time is spent mod B^18+1. Less than 1/8 the time is spent for each of the two last factors B^9+1, and B^9-1. ... then maybe I was wrong when I wrote that we should not trade a factor 3 with a factor 2... Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: New mulmod_bknp1
Ciao David, Il Mar, 22 Febbraio 2022 10:55 pm, David Harvey ha scritto: > On Tue, 2022-02-22 at 22:39 +0100, Marco Bodrato wrote: >> > E.g, in this case we could try a top-level B^66 - 1 product, split in >> > B^33+1 and B^33-1; then the former suits your new algorithm well, but >> > the former can't be recursively split (at least not on a B boundary). >> I fully agree. > What about > > B^33 - 1 = (B^11 - 1)(B^22 + B^11 + 1)? Actually, B is a number. And if B is a power of 4 (there is an even number of bits in a limb, as usually happens, e.g. 32, or 64) then (B^n-1) and (B^2n+B^n+1) are not coprime, because both are divisible by 3. And we can not use CRT. > Also, I suspect that some of these tricks are worth doing even if the > factorisations cross the B boundaries. The extra shifting overhead is only > linear. Maybe. One should write the code and try. Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: New mulmod_bknp1
Ciao, Il Mar, 22 Febbraio 2022 8:04 pm, Niels Möller ha scritto: > Marco Bodrato writes: >> Simply, if a multiplication mod B^{3n}+1 is needed, the code computes >> - a product mod B^{n}+1 >> - a product mod B^{2n}-B^{n}+1 >> - with CRT, the desired result is obtained. > Ok, and can the second product be computed more efficiently than full > product + mod operation? No. At least, I don't know; and it's currently implemented as a product+mod. But, if M(n) (the cost of a nxn product) is more than linear in n, then it's a good idea to split the computation into M(a)+M(b) where a+b=n. >> This is generalised for multiplications mod B^{kn}+1 for k in > Intuitively, I'd expect the gain is largest for k = 3 (since for larger > k, we get even further away from splitting in half). Is that right? Yes, indeed! > It seems a bit tricky to take this into account in > mpn_mulmod_bnm1_next_size, but maybe it's doable? > > E.g, in this case we could try a top-level B^66 - 1 product, split in > B^33+1 and B^33-1; then the former suits your new algorithm well, but > the former can't be recursively split (at least not on a B boundary). If I fully agree. MULMOD_BNM1_THRESHOLD is usually small, and the expected gain on the bnm1 side is larger than on the bnp1 side. We should never trade a 2 factor with a 3 factor (or 8 with 9...). But, assume MULMOD_BNM1_THRESHOLD > 15, should we prefer the size 14*32 or the size 15*32? Maybe the latter! > And we also have mpn_fft_next_size, but I'm not familiar with how that > works. I believe that code is quite simple, it chooses the first multiple of 2^k, taking k from the "the best k for a given range" table. But question "how does the new code interacts with the code that generate the 'best k' table?" seems more complex. Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: New mulmod_bknp1
Ciao, Il 2022-02-21 01:37 Torbjörn Granlund ha scritto: I am too busy to examine the code to see what you've done. Perhaps you could outline the algorithms here? Nothing special... Simply, if a multiplication mod B^{3n}+1 is needed, the code computes - a product mod B^{n}+1 - a product mod B^{2n}-B^{n}+1 - with CRT, the desired result is obtained. This is generalised for multiplications mod B^{kn}+1 for k in {3,5,7,13,17}. Is n = 3^t-k now slower than n' = 3^t for small k (with k mod 3 != 0)? Yes. But that's true for the product mod B^n+1. Then we could zero-pad such operands... If a product mod B^64-1 is required to mulmod_bnm1, then the function computes two multiplications: mod B^32-1, and mod B^32+1. Computing a product mod B^33+1 is now faster than computing mod B^32+1... ...but the product mod B^33+1 is simply not useful, the calling function needs the result mod B^32+1! Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: New mulmod_bknp1
Ciao, Il 2022-02-15 11:48 Marco Bodrato ha scritto: I pushed some new functions to compute the product (and square) modulo B^nn+1, for small values of the exponent nn. For large values the already available fft_mul should be used. A possible use is replacing the last-level point-wise multiplication of mul_fft. I tried to do that. The new functions actually require nn=k*n, where k can be 3 or in {5, 7, 13, 17}. This means that it can be used only when mul_fft falls into such a size. The results is that for some sizes there is an effect, for other sizes there is none. Not optimal, but an improvement anyway... I attach a graph (fft_with_bknp1.png) obtained plotting the output of tune/speed -s 256-131072 -t128 mpn_mul_fft before (_old) and after (_new) the patch. Unfortunately the current tuning criteria does not interact well with the added code. If the library is re-tuned, then there are sizes where a worst combination is chosen and the effect is a slowdown. I attach another graph (fft_retuned.png) with the timings obtained as above, the "_retuned" line was measured after re-tuning (make -C tune tune). I don't know how the generation of FFT_TABLE works, should we change it somehow? Ĝis, m___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
New mulmod_bknp1
Ciao, I pushed some new functions to compute the product (and square) modulo B^nn+1, for small values of the exponent nn. For large values the already available fft_mul should be used. The new functions actually require nn=k*n, where k can be 3 or in {5, 7, 13, 17}. Also k=11 can be supported, but it's currently disabled. I added also tune/speed support, but the problem with the analysis is that the function is defined for some values only... Here are some results: @shell ~/gmp-repo]$ /var/tmp/bodrato/gmp/hg/build/tune/speed -Cr -s 75-85\,240-250\,725-735 mpn_mul_n mpn_mulmod_bknp1 overhead 7.15 cycles, precision 1 units of 2.86e-10 secs, CPU freq 3500.09 MHz mpn_mul_n mpn_mulmod_bknp1 75 159.4800 #0.9715 76 #167.2895 n/a 77 #167.15581.0793 78 193.8462 #0.7311 79 #169.8734 n/a 80 182.1500 #0.8597 81 196.0617 #0.7721 82 #170.8780 n/a 83 #170.8916 n/a 84 169.2024 #0.9344 85 188.8235 #0.7898 240 258.7792 #0.7474 241 #254.4274 n/a 242 266.2355 #0.9051 243 260.2798 #0.7414 244 #256.3893 n/a 245 256.8449 #0.8595 246 255.7073 #0.7915 247 257.0324 #0.9904 248 #257.6532 n/a 249 257.2410 #0.7925 250 258.1480 #0.8535 725 378.2538 #0.8652 726 377.4793 #0.8362 727 #376.6836 n/a 728 376.2060 #0.9164 729 379.7462 #0.7682 730 379.1068 #0.8727 731 379.0109 #0.9805 732 377.9754 #0.8285 733 #379.6958 n/a 734 #377.6403 n/a 735 376.0204 #0.8094 As you can see, the new functions are particularly effective, when there are many factors "3" in the size... Integrating this in the current mod_bnm1 is easy, but the effect would be to accelerate the operations only for some sizes. This may add "noise" to the thresholds system. Some more experiments are needed. In the meanwhile, the base code will be tested by the great testing system of GMP :-) Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mpq_mul_ui
Ciao, Il 2022-01-23 01:34 Marc Glisse ha scritto: Hello, What would you think of adding mpq_mul_ui, mpq_div_ui, mpq_ui_div, and also the _z versions? I tried to think to the mini-mpq version of the _z flavor. diff -r ed0406cf3c70 mini-gmp/mini-mpq.c --- a/mini-gmp/mini-mpq.c Wed Feb 02 19:16:36 2022 +0100 +++ b/mini-gmp/mini-mpq.c Tue Feb 08 10:04:55 2022 +0100 @@ -395,6 +395,30 @@ } void +mpq_mul_z (mpq_t r, const mpq_t a, const mpz_t b) +{ + mpq_t t; + static const mp_limb_t u = 1; + + mpq_mul (r, a, mpq_roinit_normal_nn (t, b->_mp_d, b->_mp_size, , 1)); +} + +void +mpq_z_div (mpq_t r, const mpz_t b, const mpq_t a) +{ + mpq_t t; + mpq_mul_z (r, mpq_roinit_zz (t, mpq_denref (a), mpq_numref (a)), b); +} + +void +mpq_div_z (mpq_t r, const mpq_t a, const mpz_t b) +{ + mpq_t t; + mpq_z_div (r, b, a); + mpq_inv (r, r); +} + +void mpq_div_2exp (mpq_t r, const mpq_t q, mp_bitcnt_t e) { mp_bitcnt_t z = mpz_scan1 (mpq_numref (q), 0); It seems simple enough. It should support (but I'm not sure!) the reuse cases mpq_mul_z (B, A, mpq_denref (B)) and mpq_mul_z (A, A, Z) but I fear it would not support mpq_mul_z (A, A, mpq_denref (A)) or mpq_mul_z (A, A, mpq_numref (A)) Those are actually equivalent respectively to mpz_set_ui (mpq_denref (A), 1); and mpz_mul (mpq_numref (A), mpq_numref (A), mpq_numref (A)) But I think we should support all reuse cases, shouldn't we? Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Error handler for GMP?
Ciao, Il 2021-06-12 16:49 Marco Bodrato ha scritto: Anyway, for internal coherence, I think that we should at least check that all the functions in the library that might "abort" do this using an "exception". For this, I propose the attached patch. That patch ("Handle overflow in mpz_type through errno") was pushed. We still have at least two functions directly using an abort(): __gmp_exception, and __gmp_invalid_operation. They live in two different files (errno.c and invalid.c respectively), that handle somehow differently how to rise() a signal. The first checks #ifdef SIGFPE, the second depends on #if HAVE_UNISTD_H, and #if ! HAVE_RAISE. I'm tempted to remove the file invalid.c, and to keep errno.c only, with the function __gmp_invalid_operation() calling __gmp_exception as all the other exceptions do. Of course with a new GMP_ERROR_INVALID_FLOAT_OPERATION = 32 constant. Any comment? En passant: the previous patch removed the message printed to stderr. Should we recover it using the following? void __gmp_overflow_in_mpz (void) { fprintf (stderr, "gmp: overflow in mpz type\n"); /*This line added.*/ __gmp_exception (GMP_ERROR_MPZ_OVERFLOW); } Should we print a message also for the other exceptions in errno.c? Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mpq_mul_ui
Ciao, Il Mer, 26 Gennaio 2022 7:53 am, Marco Bodrato ha scritto: > Il 2022-01-24 10:04 Torbjörn Granlund ha scritto: >> Marc Glisse writes: >> >> What would you think of adding mpq_mul_ui, mpq_div_ui, mpq_ui_div, >> and >> also the _z versions? >> >> That would make sense to me. > > I agree too, I just remember an observation about the naming convention: > https://gmplib.org/list-archives/gmp-discuss/2018-August/006274.html ...but actually only the _singleui versions of _mul_ and _div_ are really useful. Moreover, to handle the _twouis version, the need to check if the given ui/ui fraction is normalised arises. Shall we start with the _z flavours? On the other side, it might be useful to have functions to _add_, or _sub_ small fractions: eg. +1/2, -4/3. Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mpq_mul_ui
Ciao, Il 2022-01-24 10:04 Torbjörn Granlund ha scritto: Marc Glisse writes: What would you think of adding mpq_mul_ui, mpq_div_ui, mpq_ui_div, and also the _z versions? That would make sense to me. I agree too, I just remember an observation about the naming convention: https://gmplib.org/list-archives/gmp-discuss/2018-August/006274.html Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: proposal to change the return type/value of mpz_primorial_ui
Ciao Shane, Il 2021-11-11 23:58 Shane Neph ha scritto: First, thank you for your work. I will be using it. I'd like to see it or similar become part of the GMP library. I'm not sure you can use it! :-D I forgot a licence, so the default is "all rights reserved". I send the code again, with a GPLv3+ licence, and a correction such that the function works also for the value 2^32-1 in a 32-bits environment. Moreover, remember that this implementation uses functions that are internal, undocumented, and because of this subject to unexpected changes in the future. Before seeing something like this in GMP, we have to at least decide the interface :-) To write the function, I got inspiration from a code by Seth Troisi, who used more than a call to the prime counting function to get the n-th prime number. Should we allow recycling parts of the computations between calls? How? I also would like to see the primordial function return that same value because it knows the value. True, but this is not the only information that the function "knows" and might be useful. Why not returning, e.g., the larger prime? In my work, I use primordial in ways similar to factorials, and I use it over a factorial because it simply grows faster. Not the implementation in GMP :-) mpz_primorial_ui is the "square free" version of mpz_fac_ui :-D Strictly smaller, when the argument is larger than 3. Ĝis, m -- http://bodrato.it/papers//* pi (N) -- Returns the number of primes in the range [1..N]. Contributed to the GNU project by Marco Bodrato. THE FUNCTION IN THIS FILE IS INTERNAL WITH A MUTABLE INTERFACE. IT IS ONLY SAFE TO REACH IT THROUGH DOCUMENTED INTERFACES. IN FACT, IT IS ALMOST GUARANTEED THAT IT WILL CHANGE OR DISAPPEAR IN A FUTURE GNU MP RELEASE. Copyright 2019-2021 Free Software Foundation, Inc. This file was written for the GNU MP Library, and it is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. The GNU MP Library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received copies of the GNU General Public License and the GNU Lesser General Public License along with the GNU MP Library. If not, see https://www.gnu.org/licenses/. */ #include "gmp-impl.h" #include "longlong.h" #ifndef PI_LEGENDRE_THRESHOLD #define PI_LEGENDRE_THRESHOLD 8 #endif /* n_fto_bit (n) = ((n-1)&(-CNST_LIMB(2)))/3U-1 */ static mp_limb_t n_fto_bit (mp_limb_t n) { return ((n-5)|1)/3U; } /* n_fto_bit(x) = gmp_legendre_pi_phi_2 (x) - 2 */ static mp_size_t primesieve_size (mp_limb_t n) { return n_fto_bit(n) / GMP_LIMB_BITS + 1; } static mp_limb_t limb_apprsqrt (mp_limb_t x) { int s; ASSERT (x > 2); count_leading_zeros (s, x); s = (GMP_LIMB_BITS - s) >> 1; return ((CNST_LIMB(1) << (s - 1)) + (x >> 1 >> s)); } /* phi(a, 0) = a * phi(a, b) = phi(a, b-1) - phi(a/p_b, b-1) * phi(a, 1) = phi(a, 0) - phi(a/2, 0) = a - a/2 * phi(a, 2) = phi(a, 1) - phi(a/3, 1) = a - a/2 - (a/3 - a/6) ~= a/3 * phi(a, b) = phi(a, 2) - sum_{1 0); /* if (primes == 0) */ /* return res; */ /* if (res * 3 == n) n -= 2; */ unsigned count = 0; unsigned long i = 4, *sp = sieve; do { /* Loop on primes */ for (unsigned long b = i, x = ~ *(sp++); x != 0; b += 3, x >>= 1) if (x & 1) { /* b = 4 + 3*k : * if k = 2n, b = 4 (mod 6) is even, the prime is b+1, *and b+2 = 0 (mod 6) is composite; * if k = 2n+1, b = 1 (mod 6) is odd, b is the prime, *and b+1 (even), b+2 = 3 (mod 6), b+2+(b%2) (even) are composite. */ unsigned long frac = n / (b | 1); if (frac <= (b + 2 + (b & 1))) return res - primes + count - (frac >= (b | 1)); if (count < 2) /* Directly compute, avoid recursion. */ res -= gmp_legendre_pi_phi_2 (frac) /* count == 0 */ - (count ? gmp_legendre_pi_phi_2 (frac / 5) : 0); /* count == 1 */ else res -= gmp_legendre_pi_phi(frac, count, sieve); if (primes == ++count) return res; } i += GMP_LIMB_BITS * 3; } while (1); } /* Takes a pre-sieved range of primes [5..maxprime]. * The range contains primes_in_sieve primes. * * Legendre showed that, if maxprime^2 >= n, then * primepi(n) = primepi(maxprime) + phi(n, primepi(maxprime)) - 1 * where phi (a, b) = numbers x <= a, not divisible by the first b primes. * * The sieve does not count 2 and 3, so that * primepi(maxprime) = primes_in_sieve + 2 * * Here we pass to pi_phi the prime index of b: primes_in_sieve. */ static long gmp_legendre_pi_ps (unsigned long n, unsigned primes_in_sieve, mp_ptr sieve) { #if (ULO
Re: proposal to change the return type/value of mpz_primorial_ui
Ciao, Il 2021-11-08 18:57 Marco Bodrato ha scritto: I got inspired by a piece of code that Seth Troisi sent a couple of years ago and I wrote a gmp_pi_ui(n) function with two variants, a trivial one (it calls gmp_primesieve) and an implementation of the Legendre strategy. I attach it for comments. Compiled with -DMAIN it prints the number of primes up to increasing powers of two. On my computer, with small ranges it is slower than GP/Pari, for large ranges, it is faster... I checked the results against https://oeis.org/A007053 and https://oeis.org/A006880 up to 2^47. Everything seems correct... Of course, for "large" (>2^32) values it is a slow function. It currently takes an ui for the argument. But it should not be impossible to implement it also for values larger than 32 bits on 32-bits CPUs... Is it worth trying? Ĝis, m -- http://bodrato.it/papers/ ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Doc tweak
Ciao Marc, Il 2021-11-06 13:35 Marc Glisse ha scritto: someone got confused by the text saying that all the GMP declarations are in gmp.h and thought the C++ class interface would be there as well, so I'll probably add this patch unless someone has a better proposition. I agree. Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: proposal to change the return type/value of mpz_primorial_ui
Ciao, Shane, On Sat, Nov 6, 2021 at 1:31 PM Marco Bodrato wrote: Il 2021-11-03 19:18 Shane Neph ha scritto: > mpz_primorial_ui( ) is a great function. Why not return the actual > number of primes that go into that calculation? It may make sense, but only if we add also another function: a way to compute the number of primes in a range; or at least in the range [1..n], i.e. pi(n) for a given unsigned long n. Il 2021-11-06 22:29 Shane Neph ha scritto: That would be a nice solution. If you really need both the primorial and the prime count, my solution is somehow a waste, because no memory of the sieved numbers is kept between the library calls. The numbers are sieved twice... I got inspired by a piece of code that Seth Troisi sent a couple of years ago and I wrote a gmp_pi_ui(n) function with two variants, a trivial one (it calls gmp_primesieve) and an implementation of the Legendre strategy. I attach it for comments. Compiled with -DMAIN it prints the number of primes up to increasing powers of two. On my computer, with small ranges it is slower than GP/Pari, for large ranges, it is faster... Ĝis, m/* pi (N) -- Returns the number of primes in the range [1..N]. Copyright 2019-2021 Marco Bodrato. */ #include "gmp-impl.h" #include "longlong.h" #ifndef PI_LEGENDRE_THRESHOLD #define PI_LEGENDRE_THRESHOLD 8 #endif /* n_fto_bit (n) = ((n-1)&(-CNST_LIMB(2)))/3U-1 */ static mp_limb_t n_fto_bit (mp_limb_t n) { return ((n-5)|1)/3U; } /* n_fto_bit(x) + 2 = gmp_legendre_pi_phi_2 (x) */ static mp_size_t primesieve_size (mp_limb_t n) { return n_fto_bit(n) / GMP_LIMB_BITS + 1; } static mp_limb_t limb_apprsqrt (mp_limb_t x) { int s; ASSERT (x > 2); count_leading_zeros (s, x); s = (GMP_LIMB_BITS - s) >> 1; return ((CNST_LIMB(1) << (s - 1)) + (x >> 1 >> s)); } /* phi(a, 0) = a * phi(a, b) = phi(a, b-1) - phi(a/p_b, b-1) * phi(a, 1) = phi(a, 0) - phi(a/2, 0) = a - a/2 * phi(a, 2) = phi(a, 1) - phi(a/3, 1) = a - a/2 - (a/3 - a/6) ~= a/3 * phi(a, b) = phi(a, 2) - sum_{1 0); /* if (primes == 0) */ /* return res; */ /* if (res * 3 == n) n -= 2; */ unsigned count = 0; unsigned long i = 4, *sp = sieve; do { /* Loop on primes */ for (unsigned long b = i, x = ~ *(sp++); x != 0; b += 3, x >>= 1) if (x & 1) { /* b = 4 + 3*k : * if k = 2n, b = 4 (mod 6) is even, the prime is b+1, *and b+2 = 0 (mod 6) is composite; * if k = 2n+1, b = 1 (mod 6) is odd, b is the prime, *and b+1 (even), b+2 = 3 (mod 6), b+2+(b%2) (even) are composite. */ unsigned long frac = n / (b | 1); if (frac <= (b + 2 + (b & 1))) return res - primes + count - (frac >= (b | 1)); if (count < 2) /* Directly compute, avoid recursion. */ res -= gmp_legendre_pi_phi_2 (frac) /* count == 0 */ - (count ? gmp_legendre_pi_phi_2 (frac / 5) : 0); /* count == 1 */ else res -= gmp_legendre_pi_phi(frac, count, sieve); if (primes == ++count) return res; } i += GMP_LIMB_BITS * 3; } while (1); } /* Takes a pre-sieved range of primes [5..maxprime]. * The range contains primes_in_sieve primes. * * Legendre showed that, if maxprime^2 >= n, then * primepi(n) = primepi(maxprime) + phi(n, primepi(maxprime)) - 1 * where phi (a, b) = numbers x <= a, not divisible by the first b primes. * * The sieve does not count 2 and 3, so that * primepi(maxprime) = primes_in_sieve + 2 * * Here we pass to pi_phi the prime index of b: primes_in_sieve. */ static long gmp_legendre_pi_ps (unsigned long n, unsigned primes_in_sieve, mp_ptr sieve) { return primes_in_sieve + 1 + gmp_legendre_pi_phi(n, primes_in_sieve, sieve); } static long gmp_legendre_pi_ui(unsigned long n) { unsigned long sqrt_n = limb_apprsqrt (n); unsigned sieve_size = primesieve_size (sqrt_n); TMP_SDECL; TMP_SMARK; mp_ptr sieve = TMP_SALLOC_LIMBS (sieve_size); unsigned count = gmp_primesieve (sieve, sqrt_n); long res = gmp_legendre_pi_ps (n, count, sieve); TMP_SFREE; return res; } /* TODO: the _primesieve interface should be extended, to allow piece-wise sieving. Storing the full sieve is wasteful here. */ static long gmp_persieve_pi_ui (unsigned long n) { mp_size_t size = primesieve_size (n); TMP_DECL; TMP_MARK; long res = gmp_primesieve (TMP_ALLOC_LIMBS (size), n); TMP_FREE; return res + 2; } long gmp_pi_ui (unsigned long n) { if (n < 7) return (n>1) + (n>2) + (n>4); if (BELOW_THRESHOLD (n, PI_LEGENDRE_THRESHOLD)) return gmp_persieve_pi_ui (n); else return gmp_legendre_pi_ui (n); } #ifdef MAIN int main() { unsigned step = 2; for (unsigned long n=1; n< ~0UL / step ; n *= step) gmp_printf ("%lu -> %ld\n", n, gmp_pi_ui (n)); } #endif ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: proposal to change the return type/value of mpz_primorial_ui
Ciao, Il 2021-11-03 19:18 Shane Neph ha scritto: mpz_primorial_ui( ) is a great function. Why not return the actual number of primes that go into that calculation? It may make sense, but only if we add also another function: a way to compute the number of primes in a range; or at least in the range [1..n], i.e. pi(n) for a given unsigned long n. Without that function, the only way to compute pi(n) in GMP would be: compute primorial(n), discard the result, and take the return value only. Such a function does not belong to any of the mp[nzfq] layers... Can the prototype be something like the following? long gmp_pi_ui(unsigned long n); Of course, in this area GMP shall not claim to be the faster available library ;-D Ĝis, m -- http://bodrato.it/papers/ ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: [Request for comments] Potential room for speedup when calculating divmod() of bases with many trailing 0 bits (in binary)
Ciao, Il 2020-09-21 17:30 Marco Bodrato ha scritto: Il 2020-09-14 18:50 Shlomi Fish ha scritto: I was able to improve upon mpz_mod here: This is in case the number that one divides by is itself divisible by a large power of 2. There are many special forms for the divisors that can stimulate writing special code to speed up the modulus computation. The form k*2^n is one of them. Is there interest in incorporating such a change to the core GMP library? By the way, if you want to play with this, try the below patch for GMP, recompile it, and test your "benchmark" again. I slightly reworked the patch. Should we apply it? It accelerates the functions mpz_?div_{qr,r} when the denominator has low zero limbs. In the general case, checking should cost a single (does UNLIKELY make it well predicted?) branch, and just a couple of additional operations on pointers are needed: rp + n0, and dl - n0. We never specially handle such a simple special case with the mpn layer, but it may make sense for the mpz layer. We already handle low zero limbs in mpz_powm... Comments? diff -r a9440b272ec5 mpz/tdiv_qr.c --- a/mpz/tdiv_qr.c Sun Oct 31 14:59:02 2021 +0100 +++ b/mpz/tdiv_qr.c Mon Nov 01 11:15:08 2021 +0100 @@ -36,7 +36,7 @@ void mpz_tdiv_qr (mpz_ptr quot, mpz_ptr rem, mpz_srcptr num, mpz_srcptr den) { - mp_size_t ql; + mp_size_t ql, n0; mp_size_t ns, ds, nl, dl; mp_ptr np, dp, qp, rp; TMP_DECL; @@ -95,7 +95,12 @@ np = tp; } - mpn_tdiv_qr (qp, rp, 0L, np, nl, dp, dl); + for (n0 = 0; UNLIKELY (*dp == 0); ++dp) +{ + rp [n0++] = *np++; + --nl; +} + mpn_tdiv_qr (qp, rp + n0, 0L, np, nl, dp, dl - n0); ql -= qp[ql - 1] == 0; MPN_NORMALIZE (rp, dl); diff -r a9440b272ec5 mpz/tdiv_r.c --- a/mpz/tdiv_r.c Sun Oct 31 14:59:02 2021 +0100 +++ b/mpz/tdiv_r.c Mon Nov 01 11:15:08 2021 +0100 @@ -35,7 +35,7 @@ void mpz_tdiv_r (mpz_ptr rem, mpz_srcptr num, mpz_srcptr den) { - mp_size_t ql; + mp_size_t ql, n0; mp_size_t ns, nl, dl; mp_ptr np, dp, qp, rp; TMP_DECL; @@ -88,7 +88,12 @@ np = tp; } - mpn_tdiv_qr (qp, rp, 0L, np, nl, dp, dl); + for (n0 = 0; UNLIKELY (*dp == 0); ++dp) +{ + rp [n0++] = *np++; + --nl; +} + mpn_tdiv_qr (qp, rp + n0, 0L, np, nl, dp, dl - n0); MPN_NORMALIZE (rp, dl); Ĝis, m -- http://bodrato.it/papers/ ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Suggested tune/tuneup.c patch
Ciao, Il 2021-11-01 01:04 Torbjörn Granlund ha scritto: Marco Bodrato writes: fac_ui... and I find an empty page if I look at https://gmplib.org/devel/thres/2021-10-30/FAC_ODD_THRESHOLD Still there, but for some reason nginx had stopped serving gz pages, and there were no non-gzip pages on the server. I made a quick workaround. And the workaround works. Thanks! Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Suggested tune/tuneup.c patch
Ciao, Il 2019-09-11 19:37 t...@gmplib.org ha scritto: I added a history preservation feature to the .../devel/thres/ pages. At 23:59 each night, all pages are copied to .../devel/thres/-MM-DD. (There is no index, one needs to type in the wanted date manually.) Is this still active? I can access https://gmplib.org/devel/thres/2021-10-30/ , but I'd like to check how FAC_ODD_THRESHOLD evolves after my last commit to fac_ui... and I find an empty page if I look at https://gmplib.org/devel/thres/2021-10-30/FAC_ODD_THRESHOLD Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Please update addaddmul_1msb0.asm to support ABI in mingw64
Ciao Niels, Il 2021-10-06 21:54 ni...@lysator.liu.se ha scritto: Now, question is if it can beat mul_1 + addmul_1. I don't know. Well, the question is also if the current code beats mul_1 + addmul_1 :-) Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Please update addaddmul_1msb0.asm to support ABI in mingw64
Ciao, Il 2021-10-06 17:21 ni...@lysator.liu.se ha scritto: Marco Bodrato writes: I agree. Let's choose between the two last versions, and I'll push it. Nice with a few instructions less, feel free to push the version you like best. I pushed the last one: https://gmplib.org/repo/gmp/rev/8e8ae372361b Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Please update addaddmul_1msb0.asm to support ABI in mingw64
Ciao Niels, Il 2021-10-05 20:56 ni...@lysator.liu.se ha scritto: Marco Bodrato writes: mov %r11, -16(rp) mov %r10, %r11 jmp L(one) I had hoped this jump and preceding instructions could be eliminated, to get a structure like ja L(two) jz L(one) L(nul): (no jumps to this label left) ... But might need other move instructions, to get the right data into the right registers? Maybe with a couple of move with computed address, and with the same value in r10 and r11 we can get rid of the jump. The difference, wrt the last code I sent, follows. *** *** 122,135 cmp $1, R32(n) ja L(two) ! jnz L(nul) ! ! mov -8(ap), %rax ! mov %r11, -16(rp) mov %r10, %r11 ! jmp L(one) ! L(nul): mov -16(ap), %rax ! mov %r11, -24(rp) ! mul %r8 add %rax, %r10 mov -16(bp), %rax --- 122,131 cmp $1, R32(n) ja L(two) ! mov -16(ap,n,8), %rax ! mov %r11, -24(rp,n,8) mov %r10, %r11 ! jz L(one) ! L(nul): mul %r8 add %rax, %r10 mov -16(bp), %rax So I think your version is an improvement as is, and perhaps not worth the effort to try to eliminate a few more instructions if this rather obscure function. I agree. Let's choose between the two last versions, and I'll push it. Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Please update addaddmul_1msb0.asm to support ABI in mingw64
Il 2021-10-05 07:38 ni...@lysator.liu.se ha scritto: Some potential further improvements: It would likely be possible to order the cases at the end as L(nul), L(one), L(two), and let the nul case fall through into the one case, reducing the size a bit. And a mulx version could likely eliminate a lot of the move instructions. Well, I added one more move to order the cases as you suggest. The code gets a little bit shorter. I think its performance matters mainly for gcdext in the Lehmer size range. (It's also used by hgcd base case, but for sizes where that is used, I'd guess that it's a pretty small part of the complete gcd computation). I also renamed registers, so that a push/pop couple is needed only if the loop is used; this may save a couple of cycles when the size is small. Does this make sense? I attach the possible new version. Ĝis, mdnl AMD64 mpn_addaddmul_1msb0, R = Au + Bv, u,v < 2^63. dnl Copyright 2008, 2021 Free Software Foundation, Inc. dnl This file is part of the GNU MP Library. dnl dnl The GNU MP Library is free software; you can redistribute it and/or modify dnl it under the terms of either: dnl dnl* the GNU Lesser General Public License as published by the Free dnl Software Foundation; either version 3 of the License, or (at your dnl option) any later version. dnl dnl or dnl dnl* the GNU General Public License as published by the Free Software dnl Foundation; either version 2 of the License, or (at your option) any dnl later version. dnl dnl or both in parallel, as here. dnl dnl The GNU MP Library is distributed in the hope that it will be useful, but dnl WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY dnl or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License dnl for more details. dnl dnl You should have received copies of the GNU General Public License and the dnl GNU Lesser General Public License along with the GNU MP Library. If not, dnl see https://www.gnu.org/licenses/. include(`../config.m4') C cycles/limb C AMD K8,K9 2.167 C AMD K10 2.167 C Intel P4 12.0 C Intel core2 4.0 C Intel corei ? C Intel atom ? C VIA nano ? C TODO C * Perhaps handle various n mod 3 sizes better. The code now is too large. C INPUT PARAMETERS define(`rp', `%rdi') define(`ap', `%rsi') define(`bp_param', `%rdx') define(`n', `%rcx') define(`u0', `%r8') define(`v0', `%r9') define(`bp', `%rbp') ABI_SUPPORT(DOS64) ABI_SUPPORT(STD64) ASM_START() TEXT ALIGN(16) PROLOGUE(mpn_addaddmul_1msb0) FUNC_ENTRY(4) IFDOS(` mov 56(%rsp), %r8 ') IFDOS(` mov 64(%rsp), %r9 ') push %rbp lea (ap,n,8), ap lea (bp_param,n,8), bp lea (rp,n,8), rp neg n mov (ap,n,8), %rax mul %r8 mov %rax, %r11 mov (bp,n,8), %rax mov %rdx, %r10 add $3, n jns L(end) push %r13 ALIGN(16) L(top): mul %r9 add %rax, %r11 mov -16(ap,n,8), %rax adc %rdx, %r10 mov %r11, -24(rp,n,8) mul %r8 add %rax, %r10 mov -16(bp,n,8), %rax mov $0, R32(%r13) adc %rdx, %r13 mul %r9 add %rax, %r10 mov -8(ap,n,8), %rax adc %rdx, %r13 mov %r10, -16(rp,n,8) mul %r8 add %rax, %r13 mov -8(bp,n,8), %rax mov $0, R32(%r11) adc %rdx, %r11 mul %r9 add %rax, %r13 adc %rdx, %r11 mov (ap,n,8), %rax mul %r8 add %rax, %r11 mov %r13, -8(rp,n,8) mov (bp,n,8), %rax mov $0, R32(%r10) adc %rdx, %r10 add $3, n js L(top) pop %r13 L(end): mul %r9 add %rax, %r11 adc %rdx, %r10 cmp $1, R32(n) ja L(two) jnz L(nul) mov -8(ap), %rax mov %r11, -16(rp) mov %r10, %r11 jmp L(one) L(nul): mov -16(ap), %rax mov %r11, -24(rp) mul %r8 add %rax, %r10 mov -16(bp), %rax mov $0, R32(%r11) adc %rdx, %r11 mul %r9 add %rax, %r10 mov -8(ap), %rax adc %rdx, %r11 mov %r10, -16(rp) L(one): mul %r8 add %rax, %r11 mov -8(bp), %rax mov $0, R32(%r10) adc %rdx, %r10 mul %r9 add %rax, %r11 adc %rdx, %r10 L(two): mov %r11, -8(rp) mov %r10, %rax L(ret): pop %rbp FUNC_EXIT() ret EPILOGUE() ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Please update addaddmul_1msb0.asm to support ABI in mingw64
Ciao Niels, you are the author of this code. Il 2020-06-07 21:48 Idigger Lee ha scritto: Please update addaddmul_1msb0.asm to support ABI in mingw64. While looking at this e-mail on gmp-bugs, I added DOS support and also reordered the branches around the exit code. Do you agree with those changes? 8<- *** /tmp/extdiff.X4CyBq/gmp-err.b03b3cb1ce98/mpn/x86_64/addaddmul_1msb0.asm 2020-11-16 21:41:15.0 +0100 --- /home/bodrato/src/gmp-err/mpn/x86_64/addaddmul_1msb0.asm 2020-10-29 04:37:12.369375684 +0100 *** define(`u0',`%r8') *** 51,64 --- 51,70 define(`v0', `%r9') define(`bp', `%rbp') + ABI_SUPPORT(DOS64) + ABI_SUPPORT(STD64) + ASM_START() TEXT ALIGN(16) PROLOGUE(mpn_addaddmul_1msb0) + FUNC_ENTRY(4) + IFDOS(` mov 56(%rsp), %r8 ') + IFDOS(` mov 64(%rsp), %r9 ') push%r12 push%rbp lea (ap,n,8), ap lea (bp_param,n,8), bp *** L(top): mul %r9 *** 105,170 mov $0, R32(%r10) adc %rdx, %r10 add $3, n js L(top) ! L(end): cmp $1, R32(n) ! ja 2f ! jz 1f ! ! mul %r9 add %rax, %r12 - mov -16(ap), %rax adc %rdx, %r10 ! mov %r12, -24(rp) mul %r8 add %rax, %r10 ! mov -16(bp), %rax mov $0, R32(%r11) adc %rdx, %r11 mul %r9 add %rax, %r10 - mov -8(ap), %rax adc %rdx, %r11 ! mov %r10, -16(rp) mul %r8 ! add %rax, %r11 ! mov -8(bp), %rax mov $0, R32(%r12) adc %rdx, %r12 mul %r9 ! add %rax, %r11 ! adc %rdx, %r12 ! mov %r11, -8(rp) ! mov %r12, %rax ! pop %rbp ! pop %r12 ! ret ! ! 1:mul %r9 ! add %rax, %r12 mov -8(ap), %rax ! adc %rdx, %r10 ! mov %r12, -16(rp) mul %r8 ! add %rax, %r10 mov -8(bp), %rax ! mov $0, R32(%r11) ! adc %rdx, %r11 mul %r9 - add %rax, %r10 - adc %rdx, %r11 - mov %r10, -8(rp) - mov %r11, %rax - pop %rbp - pop %r12 - ret - - 2:mul %r9 add %rax, %r12 - mov %r12, -8(rp) adc %rdx, %r10 mov %r10, %rax ! pop %rbp pop %r12 ret EPILOGUE() --- 111,164 mov $0, R32(%r10) adc %rdx, %r10 add $3, n js L(top) ! L(end): mul %r9 add %rax, %r12 adc %rdx, %r10 ! cmp $1, R32(n) ! ja L(two) ! jnz L(nul) ! ! L(one): mov -8(ap), %rax ! mov %r12, -16(rp) mul %r8 add %rax, %r10 ! mov -8(bp), %rax mov $0, R32(%r11) adc %rdx, %r11 mul %r9 add %rax, %r10 adc %rdx, %r11 ! mov %r10, -8(rp) ! mov %r11, %rax ! jmp L(ret) ! ! L(nul): mov -16(ap), %rax ! mov %r12, -24(rp) mul %r8 ! add %rax, %r10 ! mov -16(bp), %rax mov $0, R32(%r12) adc %rdx, %r12 mul %r9 ! add %rax, %r10 mov -8(ap), %rax ! adc %rdx, %r12 ! mov %r10, -16(rp) mul %r8 ! add %rax, %r12 mov -8(bp), %rax ! mov $0, R32(%r10) ! adc %rdx, %r10 mul %r9 add %rax, %r12 adc %rdx, %r10 + + L(two): mov %r12, -8(rp) mov %r10, %rax ! L(ret): pop %rbp pop %r12 + FUNC_EXIT() ret EPILOGUE() 8<- Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: div_qr_1n_pi1
Ciao, Il 2021-06-06 22:16 Torbjörn Granlund ha scritto: ni...@lysator.liu.se (Niels Möller) writes: Maybe we should have some macrology for that? Or do all relevant processors and compilers support efficient cmov these days? I'm sticking to masking expressions for now. Let's not trust results from compiler generated code for these things. The mixture of inline asm and plain code is hard for compilers to deal with. Very subtle things can make a huge cycle count difference. Of course, mixing asm and plain code will not let the compiler much freedom... Should we try if the compiler supports a larger type (e.g. unsigned __int128) and define the common macros add_ss and umul_ppmm based on it? In that case the compiler should be able to optimise also across the longlong-defined operations. Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Error handler for GMP?
Ciao John! È un piacere vederti da queste parti! Il 2021-03-22 09:55 abb...@dima.unige.it ha scritto: Does GMP offer a way to return/throw rather than simply aborting upon overflow? No, it doesn't. Yet. I could not see anything relevant in the documentation. The theme emerges every now and then on the list. And the problem is that it seems difficult, for the caller, to handle possible memory leaks, and coherence of the internal state to do anything better than to abort... But I think that lazy allocation, we have it from GMP-6.2.0, can help. I think it should be possible to write a function, using mpz_t or mpq_t only, that: 1) has a clear distinction between readable-only and writable-only parameters. 2) frees (mpz_clear) all the writable variables, and re-inits them (mpz_init, which is lazy now, and does not allocate anything). 3) sets (mp_set_memory_functions) memory functions that will keep track of any block allocated by GMP functions 4) sets the callback on error... 5) does the heavy work 6) resets callback and memory functions and returns "done". If during the step 5 an error occours: 6) frees any memory block allocated after step 3; 7) re-inits all the writable variables (mpz_init, to set everything to a coherent state) 8) resets callback and memory functions and returns "fail". Anyway, for internal coherence, I think that we should at least check that all the functions in the library that might "abort" do this using an "exception". For this, I propose the attached patch. Then, we might think if we want to let the user install a hook on the __gmp_exception function. Ĝis, mdiff -r eac10a64bc99 errno.c --- a/errno.c Sun Jun 06 23:29:28 2021 +0200 +++ b/errno.c Sat Jun 12 16:46:13 2021 +0200 @@ -70,3 +70,8 @@ { __gmp_exception (GMP_ERROR_DIVISION_BY_ZERO); } +void +__gmp_overflow_in_mpz (void) +{ + __gmp_exception (GMP_ERROR_MPZ_OVERFLOW); +} diff -r eac10a64bc99 gmp-h.in --- a/gmp-h.in Sun Jun 06 23:29:28 2021 +0200 +++ b/gmp-h.in Sat Jun 12 16:46:13 2021 +0200 @@ -2325,7 +2325,8 @@ GMP_ERROR_UNSUPPORTED_ARGUMENT = 1, GMP_ERROR_DIVISION_BY_ZERO = 2, GMP_ERROR_SQRT_OF_NEGATIVE = 4, - GMP_ERROR_INVALID_ARGUMENT = 8 + GMP_ERROR_INVALID_ARGUMENT = 8, + GMP_ERROR_MPZ_OVERFLOW = 16 }; /* Define CC and CFLAGS which were used to build this version of GMP */ diff -r eac10a64bc99 gmp-impl.h --- a/gmp-impl.h Sun Jun 06 23:29:28 2021 +0200 +++ b/gmp-impl.h Sat Jun 12 16:46:13 2021 +0200 @@ -3924,10 +3924,12 @@ __GMP_DECLSPEC void __gmp_exception (int) ATTRIBUTE_NORETURN; __GMP_DECLSPEC void __gmp_divide_by_zero (void) ATTRIBUTE_NORETURN; __GMP_DECLSPEC void __gmp_sqrt_of_negative (void) ATTRIBUTE_NORETURN; +__GMP_DECLSPEC void __gmp_overflow_in_mpz (void) ATTRIBUTE_NORETURN; __GMP_DECLSPEC void __gmp_invalid_operation (void) ATTRIBUTE_NORETURN; #define GMP_ERROR(code) __gmp_exception (code) #define DIVIDE_BY_ZERO__gmp_divide_by_zero () #define SQRT_OF_NEGATIVE __gmp_sqrt_of_negative () +#define MPZ_OVERFLOW __gmp_overflow_in_mpz () #if defined _LONG_LONG_LIMB #define CNST_LIMB(C) ((mp_limb_t) C##LL) diff -r eac10a64bc99 mpz/init2.c --- a/mpz/init2.c Sun Jun 06 23:29:28 2021 +0200 +++ b/mpz/init2.c Sat Jun 12 16:46:13 2021 +0200 @@ -43,10 +43,7 @@ if (sizeof (unsigned long) > sizeof (int)) /* param vs _mp_size field */ { if (UNLIKELY (new_alloc > INT_MAX)) - { - fprintf (stderr, "gmp: overflow in mpz type\n"); - abort (); - } + MPZ_OVERFLOW; } PTR(x) = __GMP_ALLOCATE_FUNC_LIMBS (new_alloc); diff -r eac10a64bc99 mpz/realloc.c --- a/mpz/realloc.c Sun Jun 06 23:29:28 2021 +0200 +++ b/mpz/realloc.c Sat Jun 12 16:46:13 2021 +0200 @@ -44,18 +44,12 @@ if (sizeof (mp_size_t) == sizeof (int)) { if (UNLIKELY (new_alloc > ULONG_MAX / GMP_NUMB_BITS)) - { - fprintf (stderr, "gmp: overflow in mpz type\n"); - abort (); - } + MPZ_OVERFLOW; } else { if (UNLIKELY (new_alloc > INT_MAX)) - { - fprintf (stderr, "gmp: overflow in mpz type\n"); - abort (); - } + MPZ_OVERFLOW; } if (ALLOC (m) == 0) diff -r eac10a64bc99 mpz/realloc2.c --- a/mpz/realloc2.c Sun Jun 06 23:29:28 2021 +0200 +++ b/mpz/realloc2.c Sat Jun 12 16:46:13 2021 +0200 @@ -43,10 +43,7 @@ if (sizeof (unsigned long) > sizeof (int)) /* param vs _mp_size field */ { if (UNLIKELY (new_alloc > INT_MAX)) - { - fprintf (stderr, "gmp: overflow in mpz type\n"); - abort (); - } + MPZ_OVERFLOW; } if (ALLOC (m) == 0) ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: pointers vs arrays
Ciao, Il 2021-05-12 15:17 Marc Glisse ha scritto: the latest version of gcc has a new (questionable) warning that complains if a function is declared as taking a mpz_t parameter and redeclared as taking mpz_ptr, or the reverse. We might as well be consistent and use pointers everywhere, as in the attached patch. Does someone disagree? Maybe this will move the warnings on the users side :-/ By the way, I think that also the documentation should be updated accordingly. Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: div_qr_1n_pi1
Ciao, Il 2021-06-03 12:40 Torbjörn Granlund ha scritto: If we dare use cmov (and its presumed side-channel leakage) we could probably shorten the critical path by a cycle. The "sbb" and "and" would go away. Using masks does not always give the fastest code. I tried the following variation on Niels' code, and, on my laptop with "g++-10 -O2 -mtune=icelake-client -march=icelake-client", the resulting code is comparable (faster?) with the current asm. *** mpn_div_qr_1n_pi1 (mp_ptr qp, mp_srcptr *** 245,266 * + | q0| * -+---+---+---+ *| q2| q1| q0| *+---+---+---+ */ ! umul_ppmm (p1, t, u1, dinv); ! add_ss (q2, q1, -u2, u2 & dinv, CNST_LIMB(0), u1); ! add_ss (q2, q1, q2, q1, CNST_LIMB(0), p1); ! add_ss (q2, q1, q2, q1, CNST_LIMB(0), q0); ! q0 = t; umul_ppmm (p1, p0, u1, B2); - ADDC_LIMB (cy, u0, u0, u2 & B2); - u0 -= (-cy) & d; /* Final q update */ ! add_ss (q2, q1, q2, q1, CNST_LIMB(0), cy); qp[j+1] = q1; MPN_INCR_U (qp+j+2, n-j-2, q2); add_mss (u2, u1, u0, u0, up[j], p1, p0); } --- 245,264 * + | q0| * -+---+---+---+ *| q2| q1| q0| *+---+---+---+ */ ! ADDC_LIMB (q2, q1, q0, u1); ! umul_ppmm (t, q0, u1, dinv); ! ADDC_LIMB (cy, u0, u0, u2 ? B2 : 0); ! u0 -= cy ? d : 0; ! add_ss (q2, q1, q2, q1, -u2, u2 ? dinv : 0); umul_ppmm (p1, p0, u1, B2); /* Final q update */ ! add_ss (q2, q1, q2, q1, CNST_LIMB(0), t + cy); qp[j+1] = q1; MPN_INCR_U (qp+j+2, n-j-2, q2); add_mss (u2, u1, u0, u0, up[j], p1, p0); } $ build/tune/speed -p1000 -s1-100 -f1.6 -C mpn_div_qr_1n_pi1.999 ... ASM-codeC-code 1 2.1227 1 3.6125 2 3.1758 2 3.9425 3 3.4567 3 3.8861 4 3.4758 4 3.8606 6 3.7857 6 3.8764 9 3.9912 9 3.9676 14 4.0304 14 4.0531 22 4.3461 22 4.1798 35 4.4161 35 4.2080 56 4.4744 56 4.2833 89 4.4896 89 4.2950 (I am a bit fixated with side-channel leakage; our present implementations of these particular functions are not side-channel silent.) We should write a fast version, and then a sec_ one :-) Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Side channel silent karatsuba / mpn_addmul_2 karatsuba
Ciao, since someone is asking about secure powm... I reply to an old message :-) Il 2018-12-13 07:05 ni...@lysator.liu.se ha scritto: t...@gmplib.org (Torbjörn Granlund) writes: I am looking into doing karatsuba multiplication at evaluation points 0, +1, and infinity. We usually evaluate in -1 instead of +1. It will be very interesting to see what thresholds we get with that. At a first glance, I'd say around a dozen limbs higher than the non-sec thresholds. You can try, by replacing the two files mpn/generic/toom2{2_mul,_sqr}.c with the attached ones. Then make check and make tune, to see what happens. Of course the two functions should not replace the current ones, but that was the easiest way to test the new files :-) The main advantage of evaluating in +1 is that it makes it more straight- forward to avoid side channel leakage. (Currently, we completely avoid all o(n^2) multiplication algorithms in side channel sensitive contexts.) I don't think we should rule out using -1. It needs side-channel silent conditional negation, but that's not so hard. sec_invert.c implements The attached code for squaring, uses -1. I.e. it is strictly equivalent to the non-sec version. I only replaced the strategy to obtain the absolute value of the difference, and carry propagation. It also has exactly the same memory usage. mpn_cnd_neg in in terms of mpn_lshift and mpn_cnd_sub_n, one could also I'm recycling your code here, in mpn_sec_absub. What's most efficient is not at all clear to me: negation costs O(n), but so does handling of the extra top bits one get with evaluation in +1. That's true only for the last recursion level, when you fall into _basecase... I know that toom2_sqr enters the game later wrt toom22_mul. But in the _sec_ area, to compute the square of a number we use sqr_basecase if the number is small enough, and mul_basecase for larger integers... so that a Karatsuba function may be more important for squaring than for the product... Ĝis, m/* mpn_sec_toom2_sqr -- Square {ap,an}. Contributed to the GNU project by Torbjorn Granlund. Modified by Marco BODRATO to obtain a side-channel silent version. Copyright 2006-2010, 2012, 2014, 2018, 2020 Free Software Foundation, Inc. This code is free software; you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. You should have received a copy of the GNU Affero General Public License. If not, see https://www.gnu.org/licenses/. */ #include "gmp-impl.h" static mp_limb_t mpn_sec_add (mp_ptr rp, mp_srcptr ap, mp_size_t an, mp_srcptr bp, mp_size_t bn, mp_ptr tp) { mp_limb_t cy; ASSERT (an >= bn); cy = mpn_add_n (rp, ap, bp, bn); if (an != bn) cy = mpn_sec_add_1 (rp + bn, ap + bn, an - bn, cy, tp); return cy; } static mp_size_t mpn_sec_add_itch (mp_size_t an, mp_size_t bn) { ASSERT (an >= bn); return mpn_sec_add_1_itch (an - bn); } static mp_limb_t mpn_sec_sub (mp_ptr rp, mp_srcptr ap, mp_size_t an, mp_srcptr bp, mp_size_t bn, mp_ptr tp) { mp_limb_t bw; ASSERT (an >= bn); bw = mpn_sub_n (rp, ap, bp, bn); if (an != bn) bw = mpn_sec_sub_1 (rp + bn, ap + bn, an - bn, bw, tp); return bw; } static mp_size_t mpn_sec_sub_itch (mp_size_t an, mp_size_t bn) { ASSERT (an >= bn); return mpn_sec_sub_1_itch (an - bn); } static mp_limb_t mpn_sec_absub (mp_ptr rp, mp_srcptr ap, mp_size_t an, mp_srcptr bp, mp_size_t bn, mp_ptr tp) { mp_limb_t bw; ASSERT (an >= bn); bw = mpn_sec_sub (rp, ap, an, bp, bn, tp); /* mpn_cnd_neg_ip (bw, rp, n, tp); */ #if 0 mpn_sec_sub_1 (rp, rp, an, bw, tp); MPN_FILL (tp, an, -bw); mpn_xor_n (rp, rp, tp, an); #else mpn_lshift (tp, rp, an, 1); mpn_cnd_sub_n (bw, rp, rp, tp, an); #endif return bw; } static mp_size_t mpn_sec_absub_itch (mp_size_t an, mp_size_t bn) { ASSERT (an >= mpn_sec_sub_itch (an, bn)); ASSERT (an >= mpn_sec_sub_1_itch (an)); return an; } /* Evaluate in: -1, 0, +inf <-s--><--n--> __ |_a1_|___a0_| v0 = a0 ^2 # A(0)^2 vm1 = (a0- a1)^2 # A(-1)^2 vinf= a1 ^2 # A(inf)^2 */ #if TUNE_PROGRAM_BUILD || WANT_FAT_BINARY #define MAYBE_sqr_toom2 1 #else #define MAYBE_sqr_toom2 \ (SQR_TOOM3_THRESHOLD >= 2 * SQR_TOOM2_THRESHOLD) #endif #define TOOM2_SQR_REC(p, a, n, ws) \ do { \ if (! MAYBE_sqr_toom2 \ || BELOW_THRESHOLD (n, SQR_TOOM2_THRESHOLD)) \ mpn_sqr_basecase (p, a, n); \ else\ mpn_toom2_sqr (p, a, n, ws); \ } while (0) void mpn_toom2_sqr (mp_ptr pp, mp_srcptr ap, mp_size_t an, mp_ptr scratch) { const int __gmpn_cpuvec_initialized = 1; mp_size_t n, s; mp_limb_t cy, cy2; mp_ptr asm1; #define a0 ap #define a1 (a
Re: mpz_prevprime
Ciao, Il 2020-11-16 21:23 Seth Troisi ha scritto: Thanks Marco! I'm really happy this is merged. Your code was merged with a storm of new code that has been reversed. But now it's in again! Was there any specific things you thought could be improved? Happy to look at those too. Memory usage, in my opinion, is not optimal. The current code generate a primesieve, to just loop on it once for building a list of gaps, that is usually used just once... I'm still not completely convinced that this structure is the best we can do. In the meanwhile I wander if a further modularization could allow us to build a type to store the "current state" and functions to sequentially find the next primes... Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mpz_prevprime
Ciao, Il 2020-10-16 09:51 Seth Troisi ha scritto: I included a patch with the rename but not including Marco's tdiv I pushed most of the changes you proposed. I only removed some portion of the proposed tests/mpz/t-nextprime.c. You are right, also the return value of the new function should be tested, but I think that we should find a way to check both the result and the return value of single runs of the function. E.g. test_largegap can check that mpz_prevprime always returns 1 or 2, and run_p may take another parameter so that we can check (depending on the range) if 2 is returned as expected, or 1 is enough... Let's see if we can improve it even more :-) Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: 6.2.0 build failure on x86_64 Skylake PC - FIX
Ciao, Il 2020-10-15 07:14 Marco Bodrato ha scritto: we are not changing much the gmp-6.2 branch, and we should release. Is there something important we are missing as a bug fix? I searched the gmp-devel mailing list and we missed to take any action after the following, about msys. It seems that we currently are not handling *-*-msys... is someone of the developers able to test if the proposed change is useful? Or something more is needed as Marc Glisse suggests? Il 2020-06-08 21:52 Marc Glisse ha scritto: On Sat, 30 May 2020, tsurzin wrote: This change worked to build test and run a version of GMP-6.2.0 for my PC. [handle *-*-msys exactly the same as *-*-mingw* in configure.ac] The PC is running Msys2 under Windows 10 and without change GMP failed to build. configfsf.guess does mention a triple *-pc-msys, so I guess it makes sense to handle it (or replace it with something we already handle). I don't really know in what ways it differs from mingw, probably not that much as far as GMP is concerned. I notice in a generated file: aclocal.m4: *-*-mingw* ) # actually msys Should automake also be taught about the msys triple? (the thread is available at https://gmplib.org/list-archives/gmp-bugs/2020-May/thread.html ) Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mpz_prevprime
Ciao, Il 2020-10-16 09:51 Seth Troisi ha scritto: won't this cause m = 2*prime, instead of the original result, m = 0? I used m = (diff > 0 && m) ? prime - m : m; to make sure that p can be marked as composite. Oops, you are right! I fear that the short-circuit evaluation of && can force the compiler to branch... (diff > 0) & (m != 0) should be computed branchless... But that's a stupid detail, as you said most of the time is used in the mod function or in millerrabin :-) Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mpz_prevprime
Post scriptum: Il Ven, 16 Ottobre 2020 7:13 am, Marco Bodrato ha scritto: >m = mpz_tdiv_ui(p, prime); >m = (diff > 0) ? prime - m : m; I remember that, in this context, if p = 0 (mod prime), the result m = prime is as good as the result m = 0. Because the next two lines are: if (m & 1) m += prime; Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mpz_prevprime
Ciao, Il Ven, 16 Ottobre 2020 1:47 am, Seth Troisi ha scritto: > On Thu, Oct 15, 2020 at 10:44 AM Niels Möller >> Seth Troisi writes: >> > Technically I can restructure the code to make both use with >> mpz_fdiv_ui >> > static int >> > findnext (mpz_ptr p, >> > - unsigned long(*nextmod_func)(const mpz_t, unsigned long), >> > + char direction, >> >void(*nextseq_func)(mpz_t, const mpz_t, unsigned long)) >> I see. I don't think it's worth taking out a function pointer if it's >> replaced with a flag and a couple of if statements. It would be neat if >> it could be done similarly to findnext_small Well, both mpz_fdiv and mpz_cdiv have if statements in their code, to differently handle positive and negative numbers. We can avoid those branches using tdiv, and add one here (the compiler may use a cmov). I mean: /* we have 0 < prime <= 15001 < 2^28, because of calculate_sievelimit */ #if GMP_NUMB_BITS >= 28 m = mpn_mod_1 (PTR (p), SIZ (p), (mp_limb_t) prime); #else m = mpz_tdiv_ui(p, prime); #endif m = (diff > 0) ? prime - m : m; >> with diff being +2 or -2 (and possibly generalized further), and no >> conditionals, e.g., >> >> mpz_add_si (p, p, diff * (odds_in_composite_sieve-1)); We don't have such a function :-) But adding a static function like that here, assuming that p is always positive and greater than |si|, with a single branch... can be a good idea. Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mpz_prevprime
Ciao, Il 2020-10-03 03:58 Seth Troisi ha scritto: On Thu, Mar 26, 2020 at 2:00 PM Marco Bodrato Il 2020-03-25 02:25 Seth Troisi ha scritto: + t = diff > 0 ? ((t + 1) | (t > 1)) : + ((t == 3) ? 2 : ((t - 2) | 1)); Maybe move this to the caller side? Or partially, leaving here just ASSERT (t >= 2); t |= (t != 2); I moved this to the caller side (so that both findnext and Really? :-) But it's ok anyway. I would also check many gaps around 2^{16n}, to check if everything works correctly when the search crosses the limb boundaries. Maybe structuring the test with a table {"number(base16?)", gap}, i.e. with (also) something like: {"65521", 16}, {"4294967291", 20}, {"281474976710597", 80}, {"18446744073709551557", 72}, {"1208925819614629174706111", 78}, {"79228162514264337593543950319", 78}, {"5192296858534827628530496329220021", 100}, {"340282366920938463463374607431768211297", 210}, I did this (using hex form) I threw in some 16*n-1 also I'd really add some more tests around boundaries... for both next and prec. Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mpz_prevprime
Ciao, Il 2020-10-03 03:58 Seth Troisi ha scritto: I modified the patch a tiny bit. Still hoping to get this in for an I think that the patch is interesting: a function for searching primes backward in the sequence of integers is missing and seems useful. The proposed interface is the following. @deftypefun int mpz_prevprime (mpz_t @var{rop}, const mpz_t @var{op}) @cindex Previous prime function Set @var{rop} to the greatest prime less than @var{op}. If a previous prime doesn't exist (i.e. @var{op} < 3), rop is unchanged and 0 is returned. Return 1 if @var{rop} is a probably prime, and 2 if @var{rop} is definitely prime. I personally do not like the idea that a previous prime can "not exist", because in my opinion -2 is a prime, and there are as many negative primes as there are positive ones... The function mpz_probab_prime_p in our library agrees with my opinion... but... ok, that's my opinion only. Anyway, the return value is used also to return something more interesting: 1 or 2, with the same meaning that the return value has for the function mpz_probab_prime_p. Should we add a return value also to the function mpz_nextprime? Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mpn_set_str_bits
Ciao, Il 2020-09-30 19:37 ni...@lysator.liu.se ha scritto: Marco Bodrato writes: limb = (unsigned int) sp[sn] >> (bits - shift); That's easier to read than what I proposed. Maybe worth a comment mentioning the problem case: mp_limb_t Thanks Niels! So, here is the current proposed rewrite: static mp_size_t mpn_set_str_bits (mp_ptr rp, const unsigned char *sp, size_t sn, unsigned bits) { mp_size_t rn; mp_limb_t limb; unsigned shift; for (limb = 0, rn = 0, shift = 0; sn-- > 0; ) { limb |= (mp_limb_t) sp[sn] << shift; shift += bits; if (shift >= GMP_LIMB_BITS) { shift -= GMP_LIMB_BITS; rp[rn++] = limb; /* Next line is correct also if shift == 0, bits == 8, and mp_limb_t == unsigned char. */ limb = (unsigned int) sp[sn] >> (bits - shift); } } if (limb != 0) rp[rn++] = limb; else rn = mpn_normalized_size (rp, rn); return rn; } It seems simple enough. Any further comment? Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mpn_set_str_bits
Ciao! Il 2020-09-30 09:03 ni...@lysator.liu.se ha scritto: Marco Bodrato writes: limb = sp[sn]; if (GMP_LIMB_BITS > CHAR_BIT || shift > 0) limb >>= bits - shift; else limb = 0; Do we really need to support bits == GMP_LIMB_BITS here? If not, the About mpn_set_str, the manual reads "base can vary from 2 to 256". Moreover we are fool enough to (unofficially) support "typedef unsigned char mp_limb_t;" in mini-gmp... above 5 lines could be simplified to just limb = (sp[sn] >> 1) >> (bits - 1 - shift); it should be safe in all cases. I agree. Anyway, there may be a problem only if we shift sp[sn] (unsigned char) or limb (possibly the same type) by 8 bits (if bits=8 and shift=0). An even simpler solution could be to cast to unsigned int (at least 16 bits, right?) so: limb = (unsigned int) sp[sn] >> (bits - shift); Maybe the cast is unnecessary, because the C standard says that the left operand of a shift operation is always promoted to (unsigned) int if it is smaller, correct? But it shouldn't be dangerous. Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mpn_set_str_bits
Ciao, Il 2020-09-29 12:15 Raphael Rieu-Helft ha scritto: The attached patch slightly improves the mini-gmp function mpn_set_str_bits. The invariants are also a bit clearer (shift is the The loop in mpz_import uses another strategy, a temporary limb. This reduces the number of write operations into memory. May I propose an alternative rewrite? What do you think about it? static mp_size_t mpn_set_str_bits (mp_ptr rp, const unsigned char *sp, size_t sn, unsigned bits) { mp_size_t rn; mp_limb_t limb; unsigned shift; for (limb = 0, rn = 0, shift = 0; sn-- > 0; ) { limb |= (mp_limb_t) sp[sn] << shift; shift += bits; if (shift >= GMP_LIMB_BITS) { rp[rn++] = limb; shift -= GMP_LIMB_BITS; limb = sp[sn]; if (GMP_LIMB_BITS > CHAR_BIT || shift > 0) limb >>= bits - shift; else limb = 0; } } if (limb != 0) rp[rn++] = limb; else rn = mpn_normalized_size (rp, rn); return rn; } Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: State of PRNG code in GMP
Ciao, Il 2020-06-02 11:22 t...@gmplib.org ha scritto: I'd like to have a coherent interface which also provide reentrant random functions to the mpn layer. And in no case should mpn pull in mpz. Makes sense. Unfortunately, Mersenne Twister uses mpz, and I am not saying that that was a bad choice when you implemented it. But in order to provide reentrant mpn PRNG functions, we either rewrite the relevant parts of MT to use mpn, or exclude it from a new mpn PRNG interface. Mersenne Twister only uses mpz for initialization. Moreover there is a little "bug" in the initialization procedure, so that the sequence can be the same even if the seed is different (in the range qhere it is supposed to generate different sequencese). That's wht some years ago we started rewriting that init function. Of course this will yield to different sequences too. https://gmplib.org/list-archives/gmp-bugs/2017-March/004106.html Here is the almost mpz-free init function using the xxtea scrambler. #define MX (((z >> 5 ^ y << 2) + (y >> 3 ^ z << 4)) \ ^ ((sum ^ y) + (k[(p & 3) ^ e] ^ z))) #define DELTA 0x9E3779B9 static void mangle_buf (gmp_uint_least32_t *buf, int high_bit) { /* Apply XXTEA decryption with a constant key. XXTEA offers good randomness and speed properties. */ static const gmp_uint_least32_t key[8] = {0x4BEDAF6D, 0x674DD5FB, 0xB79D42BC, 0x94C371EA, 0x5443092C, 0xA67C9FE2, 0x31CC686A, 0xC41175D6}; const gmp_uint_least32_t *k; gmp_uint_least32_t z; gmp_uint_least32_t y = buf[0] & 0x; gmp_uint_least32_t sum = ((DELTA << 1) + DELTA) << 1; /* DELTA*6 */ int p, e; k = high_bit ? key + 4 : key; do { e = sum >> 2 & 3; p = N - 2; do { z = buf[p - 1] & 0x; y = (buf[p] -= MX) & 0x; } while (--p); z = buf[N - 2] & 0x; y = (buf[0] -= MX) & 0x; } while (sum -= DELTA); } /* Seeding function. */ static void xxtea_randseed_mt (gmp_randstate_t rstate, mpz_srcptr seed) { int i, high_bit; size_t cnt; gmp_rand_mt_struct *p; mpz_t seed1; /* Intermediate result. */ mp_limb_t mp_d[BITS_TO_LIMBS (19936)]; high_bit = mpz_tstbit (seed, 19936); p = (gmp_rand_mt_struct *) RNG_STATE (rstate); PTR (seed1) = mp_d; ALLOC (seed1) = BITS_TO_LIMBS (19936); SIZ (seed1) = 0; mpz_fdiv_r_2exp (seed1, seed, 19936); mpz_export (>mt[1], , -1, sizeof (p->mt[1]), 0, CHAR_BIT * sizeof (p->mt[1]) - 32, seed1); cnt++; ASSERT (cnt <= N); while (cnt < N) p->mt[cnt++] = 0; mangle_buf (>mt[1], high_bit); /* Perform the mangling of the buffer. */ p->mt[0] = 0x8000; /* Set the first bit. */ if (!high_bit) { cnt = N - 1; do { if (p->mt[cnt] != 0) { p->mt[0] = 0;/* Unset the first bit. */ break; } } while (--cnt != 0); } /* Warm the generator up if necessary. */ if (WARM_UP != 0) for (i = 0; i < WARM_UP / N; i++) __gmp_mt_recalc_buffer (p->mt); p->mti = WARM_UP % N; } Here is some code I have tinkered with. typedef enum { GMP_PRNG_ALG_LC, GMP_PRNG_ALG_MT, GMP_PRNG_ALG_AES, GMP_PRNG_ALG_LIM, GMP_PRNG_ALG_DEFAULT = GMP_PRNG_ALG_AES } gmp_prng_alg; What's LIM? Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mini-gmp warnings
Il 2020-04-26 16:22 ni...@lysator.liu.se ha scritto: Is there an easy way to run mini-gmp tests with small limb size? In mini-gmp/mini-gmp.h we have the following lines: #ifndef MINI_GMP_LIMB_TYPE #define MINI_GMP_LIMB_TYPE long #endif typedef unsigned MINI_GMP_LIMB_TYPE mp_limb_t; So, you should define MINI_GMP_LIMB_TYPE to something like int, short, or char. The following line works in my environment: make CPPFLAGS="-DMINI_GMP_LIMB_TYPE=char" check-mini-gmp Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mini-gmp warnings
Ciao, Il 2020-04-26 15:11 ni...@lysator.liu.se ha scritto: ni...@lysator.liu.se (Niels Möller) writes: There are a couple of other uses of LOCAL_SHIFT_BITS, LOCAL_GMP_LIMB_BITS, LOCAL_CHAR_BIT, added as part of the support for testing with small limb sizes. Is it for some reason important to refer Now I've tried changing it, and it seems to also be a hack to avoid warnings? Exactly. In this macro, #define gmp_umul_ppmm(w1, w0, u, v) \ do { \ int LOCAL_GMP_LIMB_BITS = GMP_LIMB_BITS;\ if (sizeof(unsigned int) * CHAR_BIT >= 2 * GMP_LIMB_BITS)\ { \ unsigned int __ww = (unsigned int) (u) * (v); \ w0 = (mp_limb_t) __ww; \ w1 = (mp_limb_t) (__ww >> LOCAL_GMP_LIMB_BITS); \ } \ if I change the shift to w1 = (mp_limb_t) (__ww >> GMP_LIMB_BITS); \ Yes, and even more warning are triggered if mp_limb_t is unsigned char... The idea of the hack is: if the compiler is not optimising, a non-constant shift is compiled; if the compiler optimises, a "variable" with a non-changing value should be removed, and the branch should be totally skipped. I agree with your first proposal: use unsigned. It should be good even with -O0. Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mini-gmp "bug" missing mpz_fits_sint_p / mpz_fits_uint_p
Ciao, Il 2020-04-20 11:08 Vincent Lefevre ha scritto: I think that in general, you should not write code that depends on whether INT_MAX + INT_MIN == 0 or not (the constant INT_MAX + INT_MIN might be useful in some rare cases, but I think that testing whether this constant is 0 or not should be forbidden). This can mean that Forbidden? Really! :-D Anyway, using the numerical constant INT_MAX + INT_MIN is a good idea. What about the following version of the function? :-D int mpz_fits_sint_p (const mpz_t u) { return mpz_cmpabs_ui (u, (unsigned long) INT_MAX + (unsigned long) (u->_mp_size < 0 ? -(INT_MAX + INT_MIN) : 0)) <= 0; } By the way, jokes apart, Niels is right: we do not need to optimise this function. Please Niels, choose the implementation you prefer and change also mpz_fits_slong_p accordingly. Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mini-gmp "bug" missing mpz_fits_sint_p / mpz_fits_uint_p
Il 2020-04-19 10:44 ni...@lysator.liu.se ha scritto: Marco Bodrato writes: +int +mpz_fits_sint_p (const mpz_t u) +{ + return (INT_MAX + INT_MIN == 0 || mpz_cmp_ui (u, INT_MAX) <= 0) && +mpz_cmpabs_ui (u, GMP_NEG_CAST (unsigned long int, INT_MIN)) <= 0; +} I think this and mpz_fits_sshort_p would be simpler using mpz_cmp_si, int mpz_fits_sint_p (const mpz_t u) { return mpz_cmp_si (u, INT_MAX) <= 0 && mpz_cmp_si (i, INT_MIN) >= 0; } The current implementation of _cmp_si in mini- is: mpz_cmp_si (const mpz_t u, long v) { mp_size_t usize = u->_mp_size; if (v >= 0) return mpz_cmp_ui (u, v); else if (usize >= 0) return 1; else return - mpz_cmpabs_ui (u, GMP_NEG_CAST (unsigned long int, v)); } So that the compiler may optimise your code to: return mpz_cmp_ui (u, INT_MAX) <= 0 && (u->_mp_size >= 0 || mpz_cmpabs_ui (u, GMP_NEG_CAST (unsigned long int, INT_MIN)) <= 0) ; which basically is the condition I wrote, with one more branch. BTW, do we have any C implementation where INT_MAX + INT_MIN == 0, i.e., not using two's complement? I'm almost sure the compiler can optimise that out at compile time. Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mini-gmp "bug" missing mpz_fits_sint_p / mpz_fits_uint_p
Ciao, On the gmp-discuss list, Il 2020-04-11 21:41 Simon Sobisch ha scritto: mini-gmp provides mpz_fits_slong_p ad mpz_fits_uslingt_p, but it does not provide the same for smaller integer types. We can easily add the requested functions. I suggest the following code: diff -r 805304ca965a mini-gmp/mini-gmp.c --- a/mini-gmp/mini-gmp.c Tue Mar 24 23:13:28 2020 +0100 +++ b/mini-gmp/mini-gmp.c Sun Apr 19 09:47:39 2020 +0200 @@ -1565,6 +1565,32 @@ return us >= 0 && mpn_absfits_ulong_p (u->_mp_d, us); } +int +mpz_fits_sint_p (const mpz_t u) +{ + return (INT_MAX + INT_MIN == 0 || mpz_cmp_ui (u, INT_MAX) <= 0) && +mpz_cmpabs_ui (u, GMP_NEG_CAST (unsigned long int, INT_MIN)) <= 0; +} + +int +mpz_fits_uint_p (const mpz_t u) +{ + return u->_mp_size >= 0 && mpz_cmpabs_ui (u, UINT_MAX) <= 0; +} + +int +mpz_fits_sshort_p (const mpz_t u) +{ + return (SHRT_MAX + SHRT_MIN == 0 || mpz_cmp_ui (u, SHRT_MAX) <= 0) && +mpz_cmpabs_ui (u, GMP_NEG_CAST (unsigned long int, SHRT_MIN)) <= 0; +} + +int +mpz_fits_ushort_p (const mpz_t u) +{ + return u->_mp_size >= 0 && mpz_cmpabs_ui (u, USHRT_MAX) <= 0; +} + long int mpz_get_si (const mpz_t u) { diff -r 805304ca965a mini-gmp/mini-gmp.h --- a/mini-gmp/mini-gmp.h Tue Mar 24 23:13:28 2020 +0100 +++ b/mini-gmp/mini-gmp.h Sun Apr 19 09:47:39 2020 +0200 @@ -244,6 +244,10 @@ int mpz_fits_slong_p (const mpz_t); int mpz_fits_ulong_p (const mpz_t); +int mpz_fits_sint_p (const mpz_t); +int mpz_fits_uint_p (const mpz_t); +int mpz_fits_sshort_p (const mpz_t); +int mpz_fits_ushort_p (const mpz_t); long int mpz_get_si (const mpz_t); unsigned long int mpz_get_ui (const mpz_t); double mpz_get_d (const mpz_t); I attach a patch with also the tests (somehow redundant). Ĝis, mdiff -r 805304ca965a mini-gmp/mini-gmp.c --- a/mini-gmp/mini-gmp.c Tue Mar 24 23:13:28 2020 +0100 +++ b/mini-gmp/mini-gmp.c Sun Apr 19 09:48:28 2020 +0200 @@ -1565,6 +1565,32 @@ return us >= 0 && mpn_absfits_ulong_p (u->_mp_d, us); } +int +mpz_fits_sint_p (const mpz_t u) +{ + return (INT_MAX + INT_MIN == 0 || mpz_cmp_ui (u, INT_MAX) <= 0) && +mpz_cmpabs_ui (u, GMP_NEG_CAST (unsigned long int, INT_MIN)) <= 0; +} + +int +mpz_fits_uint_p (const mpz_t u) +{ + return u->_mp_size >= 0 && mpz_cmpabs_ui (u, UINT_MAX) <= 0; +} + +int +mpz_fits_sshort_p (const mpz_t u) +{ + return (SHRT_MAX + SHRT_MIN == 0 || mpz_cmp_ui (u, SHRT_MAX) <= 0) && +mpz_cmpabs_ui (u, GMP_NEG_CAST (unsigned long int, SHRT_MIN)) <= 0; +} + +int +mpz_fits_ushort_p (const mpz_t u) +{ + return u->_mp_size >= 0 && mpz_cmpabs_ui (u, USHRT_MAX) <= 0; +} + long int mpz_get_si (const mpz_t u) { diff -r 805304ca965a mini-gmp/mini-gmp.h --- a/mini-gmp/mini-gmp.h Tue Mar 24 23:13:28 2020 +0100 +++ b/mini-gmp/mini-gmp.h Sun Apr 19 09:48:28 2020 +0200 @@ -244,6 +244,10 @@ int mpz_fits_slong_p (const mpz_t); int mpz_fits_ulong_p (const mpz_t); +int mpz_fits_sint_p (const mpz_t); +int mpz_fits_uint_p (const mpz_t); +int mpz_fits_sshort_p (const mpz_t); +int mpz_fits_ushort_p (const mpz_t); long int mpz_get_si (const mpz_t); unsigned long int mpz_get_ui (const mpz_t); double mpz_get_d (const mpz_t); diff -r 805304ca965a mini-gmp/tests/t-signed.c --- a/mini-gmp/tests/t-signed.c Tue Mar 24 23:13:28 2020 +0100 +++ b/mini-gmp/tests/t-signed.c Sun Apr 19 09:48:28 2020 +0200 @@ -1,6 +1,6 @@ /* Exercise some mpz_..._si functions. -Copyright 2013, 2016 Free Software Foundation, Inc. +Copyright 2013, 2016, 2020 Free Software Foundation, Inc. This file is part of the GNU MP Library test suite. @@ -197,9 +197,152 @@ } void +try_fits_utype_p (void) +{ + mpz_t x; + mpz_init (x); + if (!mpz_fits_ulong_p (x)) +{ + printf ("mpz_fits_ulong_p (0) false!\n"); + abort (); +} + if (!mpz_fits_uint_p (x)) +{ + printf ("mpz_fits_uint_p (0) false!\n"); + abort (); +} + if (!mpz_fits_ushort_p (x)) +{ + printf ("mpz_fits_udhort_p (0) false!\n"); + abort (); +} + mpz_set_si (x, -1); + if (mpz_fits_ulong_p (x)) +{ + printf ("mpz_fits_ulong_p (- 1) true!\n"); + abort (); +} + if (mpz_fits_uint_p (x)) +{ + printf ("mpz_fits_uint_p (- 1) true!\n"); + abort (); +} + if (mpz_fits_ushort_p (x)) +{ + printf ("mpz_fits_ushort_p (- 1) true!\n"); + abort (); +} + mpz_set_ui (x, ULONG_MAX); + if (!mpz_fits_ulong_p (x)) +{ + printf ("mpz_fits_ulong_p (ULONG_MAX) false!\n"); + abort (); +} + mpz_add_ui (x, x, 1); + if (mpz_fits_ulong_p (x)) +{ + printf ("mpz_fits_ulong_p (ULONG_MAX + 1) true!\n"); + abort (); +} + mpz_set_ui (x, UINT_MAX); + if (!mpz_fits_uint_p (x)) +{ + printf ("mpz_fits_uint_p (UINT_MAX) false!\n"); + abort (); +} + mpz_add_ui (x, x, 1); + if (mpz_fits_uint_p (x)) +{ + printf ("mpz_fits_uint_p (UINT_MAX + 1) true!\n"); + abort (); +} + mpz_set_ui (x, USHRT_MAX); + if (!mpz_fits_ushort_p (x)) +{ + printf
Re: possible speedup for mpz_nextprime_small
Il 2020-03-27 21:36 Seth Troisi ha scritto: note I didn't get the time to look other the patch so it might be extra rought. Messaggio originale Oggetto: Re: Cleanup mpz_out_str in tests Il 2020-03-26 10:10 Seth Troisi ha scritto: "Why" today was that I had a little time today and long ago Neils had given this idea a soft yes. Sorry but... from my personal point of view this is a contradiction. You have the time to write an inessential patch, but you do not have the time to check the code that you ask us to read? I suggest you to invest more of your time in re-reading, improving the quality of, and refine your code before throwing it to this list. On my side, I feel I invested enough time to show you how-to improve the quality of your code. It's your turn. Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Proposed patch for mpn_trialdiv
Ciao, the current code in mpn/generic/trialdiv.c contain a comment that suggest to remove "one of the outer loop conditions". The patch I attach removes it by checking only the number of primes, and limiting the maximum for the latter. I also added a possible optimisation for a single limb. A call to mpn_trialdiv should be used also in mpz/pprime_p.c , instead of the "Do more dividing." loop. The original code was by Torbjorn. Any comment? *** ../gmp-head/mpn/generic/trialdiv.c 2018-05-08 06:15:55.711524824 +0200 --- mpn/generic/trialdiv.c 2020-04-03 23:46:03.500943337 +0200 *** *** 76,84 #include "trialdivtab.h" #undef WANT_dtab #undef P - {0,0} }; static const struct gmp_primes_ptab gmp_primes_ptab[] = { #define WANT_ptab --- 76,85 #include "trialdivtab.h" #undef WANT_dtab #undef P }; + #define DTAB_LINES (numberof (gmp_primes_dtab)) + static const struct gmp_primes_ptab gmp_primes_ptab[] = { #define WANT_ptab *** *** 88,95 #define PTAB_LINES (sizeof (gmp_primes_ptab) / sizeof (gmp_primes_ptab[0])) - /* FIXME: We could optimize out one of the outer loop conditions if we -had a final ptab entry with a huge np field. */ mp_limb_t mpn_trialdiv (mp_srcptr tp, mp_size_t tn, mp_size_t nprimes, int *where) { --- 89,94 *** *** 101,108 ASSERT (tn >= 1); ! for (i = *where; i < PTAB_LINES; i++) { ppp = gmp_primes_ptab[i].ppp; cps = gmp_primes_ptab[i].cps; --- 100,136 ASSERT (tn >= 1); ! ASSERT (DTAB_LINES == gmp_primes_ptab[PTAB_LINES - 1].np + ! gmp_primes_ptab[PTAB_LINES - 1].idx); ! nprimes = MIN (nprimes, DTAB_LINES); ! ! #if 0 ! if (tn == 1) ! { ! dp = gmp_primes_dtab; ! r = *tp; ! /* In this branch the value *where is interpreted differently. !If tn decreases to 1, some primes will be checked again. !The opposite (tn increases) should not happen. ! !THINK: Should we avoid this? Maybe using positive and !negative values for the two differen uses? So that we can !detect when we switch from one another. ! */ ! for (i = *where; i < nprimes; ++i) ! if (r * dp[i].binv <= dp[i].lim) ! { ! *where = i + 1; ! return dp[i].binv; ! } ! return 0; ! } ! #endif ! ! nprimes -= gmp_primes_ptab[*where].idx; ! for (i = *where; nprimes > 0; ++i) { + ASSERT (i < PTAB_LINES); ppp = gmp_primes_ptab[i].ppp; cps = gmp_primes_ptab[i].cps; *** *** 113,118 --- 141,147 /* Check divisibility by individual primes. */ dp = _primes_dtab[idx] + np; + ASSERT (idx + np <= DTAB_LINES); for (j = -np; j < 0; j++) { q = r * dp[j].binv; *** *** 124,131 } nprimes -= np; - if (nprimes <= 0) - return 0; } return 0; } --- 153,158 Ĝis, mdiff -r 805304ca965a mpn/generic/trialdiv.c --- a/mpn/generic/trialdiv.c Tue Mar 24 23:13:28 2020 +0100 +++ b/mpn/generic/trialdiv.c Fri Apr 03 23:49:30 2020 +0200 @@ -76,9 +76,10 @@ #include "trialdivtab.h" #undef WANT_dtab #undef P - {0,0} }; +#define DTAB_LINES (numberof (gmp_primes_dtab)) + static const struct gmp_primes_ptab gmp_primes_ptab[] = { #define WANT_ptab @@ -88,8 +89,6 @@ #define PTAB_LINES (sizeof (gmp_primes_ptab) / sizeof (gmp_primes_ptab[0])) -/* FIXME: We could optimize out one of the outer loop conditions if we - had a final ptab entry with a huge np field. */ mp_limb_t mpn_trialdiv (mp_srcptr tp, mp_size_t tn, mp_size_t nprimes, int *where) { @@ -101,8 +100,37 @@ ASSERT (tn >= 1); - for (i = *where; i < PTAB_LINES; i++) + ASSERT (DTAB_LINES == gmp_primes_ptab[PTAB_LINES - 1].np + + gmp_primes_ptab[PTAB_LINES - 1].idx); + nprimes = MIN (nprimes, DTAB_LINES); + +#if 0 + if (tn == 1) { + dp = gmp_primes_dtab; + r = *tp; + /* In this branch the value *where is interpreted differently. + If tn decreases to 1, some primes will be checked again. + The opposite (tn increases) should not happen. + + THINK: Should we avoid this? Maybe using positive and + negative values for the two differen uses? So that we can + detect when we switch from one another. + */ + for (i = *where; i < nprimes; ++i) + if (r * dp[i].binv <= dp[i].lim) + { + *where = i + 1; + return dp[i].binv; + } + return 0; +} +#endif + + nprimes -= gmp_primes_ptab[*where].idx; + for (i = *where; nprimes > 0; ++i) +{ + ASSERT (i < PTAB_LINES); ppp = gmp_primes_ptab[i].ppp; cps = gmp_primes_ptab[i].cps; @@ -113,6 +141,7 @@ /* Check divisibility by individual primes. */ dp = _primes_dtab[idx] + np; + ASSERT (idx + np <= DTAB_LINES); for (j = -np; j < 0; j++) { q = r * dp[j].binv; @@ -124,8 +153,6 @@ }
Re: mpz_prevprime
Ciao, Il 2020-03-25 02:25 Seth Troisi ha scritto: I'm back with a new mpz_prevprime patch diff -r 805304ca965a doc/gmp.texi +@deftypefun int mpz_prevprime (mpz_t @var{rop}, const mpz_t @var{op}) +If previous prime doesn't exist (e.g. @var{op} < 3), rop is unchanged and +0 is returned. I suggest "i.e." instead of "e.g.". diff -r 805304ca965a mpz/nextprime.c +findnext_small (unsigned t, short diff) [...] + ASSERT (t > 0 || (diff < 0 && t > 2)); This is equivalent to (t > 0). Do you mean (t > 2 || (diff > 0 && t > 0))? + t = diff > 0 ? ((t + 1) | (t > 1)) : + ((t == 3) ? 2 : ((t - 2) | 1)); Maybe move this to the caller side? Or partially, leaving here just ASSERT (t >= 2); t |= (t != 2); -void -mpz_nextprime (mpz_ptr p, mpz_srcptr n) +int +findnext (mpz_ptr p, If the function is not public any more, use static. - mpz_setbit (p, 0); This line can still be here, maybe converted to * PTR (p) |= 1; + /* smaller numbers handled earlier*/ + ASSERT (nbits >= 15); Not needed. Maybe ASSERT (nbits > 3), because we don't want to handle very small numbers with this code. +int +mpz_prevprime (mpz_ptr p, mpz_srcptr n) +{ + /* First handle tiny numbers */ + if (mpz_cmp_ui (n, 2) <= 0) +return 0; + + if (mpz_cmp_ui (n, NP_SMALL_LIMIT) < 0) +{ + ASSERT (NP_SMALL_LIMIT < UINT_MAX); + mpz_set_ui (p, findnext_small (SIZ (n) > 0 ? mpz_get_ui (n) : 1, -2)); + return 2; +} The line "if (mpz_cmp_ui (n, 2) <= 0)" already handle the case (SIZ (n) <= 0), checking if (SIZ (n) > 0) is a nonsense Anyway... assume that code is used: what happens if findnext_small(1,-2) is called? diff -r 805304ca965a tests/mpz/t-nextprime.c [...] +refmpz_prevprime (mpz_ptr p, mpz_srcptr t) [...] + if (mpz_cmp_ui(t, 3) <= 0) The loop after that branch handles also that case, don't it? test_largegaps () { [...] + mpz_set_ui (n, 3842610773); [...] + mpz_set_ui (n, 18361375334787046697UL); Those lines can fail at compile time... if 3842610773 does not fit in an int and 18361375334787046697 not in an unsigned long. E.g. when sizeof(int)==4. Use _set_str. + mpz_mul_ui (n, n, 4280516017); [...] + mpz_mul_ui (n, n, 3483347771); ... I also added large negative tests for mpz_nextprime To be honest... I do not like this (undocumented) detail of the current behaviour of _nextprime. I personally do not like the idea to enforce it with a test... and we can enable test_largegaps now that the default gap is smaller Yes, we should. I would also check many gaps around 2^{16n}, to check if everything works correctly when the search crosses the limb boundaries. Maybe structuring the test with a table {"number(base16?)", gap}, i.e. with (also) something like: {"65521", 16}, {"4294967291", 20}, {"281474976710597", 80}, {"18446744073709551557", 72}, {"1208925819614629174706111", 78}, {"79228162514264337593543950319", 78}, {"5192296858534827628530496329220021", 100}, {"340282366920938463463374607431768211297", 210}, +void +test_prevprime (gmp_randstate_ptr rands, int reps) [...] + /* Test mpz_prevprime(3 <= n < 2^45) returns 2. */ [...] + /* Test mpz_prevprime(n > 2^70) returns 1. */ [...] +if ( retval != 1 ) [...] I do not understand. A test-function should fail if a function behaviour does not agree with the documented interface, or it can fail for some other kind of regression that we want to avoid... Your test will warn us against possible improvements! What if the future primality testing functions will return 2 on some large primes? Ok, I'm sleepy now. I hope I did not write wrong things. Good night! Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: possible speedup for mpz_nextprime_small
Ciao, Il 2020-03-26 01:06 Seth Troisi ha scritto: Marco was interested in what using dynamically allocating primes would look like. I've attached a couple of screenshots. I'd like to understand the reason why the line "dynamic" is 5-10% lower than the other lines in the range above 27... I think a good balance would be to add 70 primes and increase SMALL_LIMIT to 1,000,000 Adding back the primes we just removed, because they are useful again! We would need a new constant to differentiate if (nbits / 2 < NUMBER_OF_PRIMES) and ASSERT (i < NUMBER_OF_PRIMES); Yes. The first is a threshold to switch from a method to another. Should it be tuned on different CPUs? NUMBER_OF_PRIMES is the hard limit for that threshold. Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Cleanup mpz_out_str in tests
Ciao, Il 2020-03-26 02:15 Seth Troisi ha scritto: This cleans up a number of printf(...) mpz_out_str(stdout, 10/16, var); printf(...); and replaces them with gmp_printf(...%Zd..., var); Why? Is the current code not working? In mini-gmp we have mpz_out_str, but we don't have gmp_printf. The code with mpz_out_str is (more) easily re-usable with mini-. Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mpz_prevprime
Ciao, Il 2020-03-24 18:54 Seth Troisi ha scritto: On Tue, Mar 24, 2020 at 9:56 AM Marco Bodrato wrote: I propose a variation of your patch, you find it attached, on my computer it's faster. Couple of small notes but otherwise this looks good +/* NP_SMALL_LIMIT = prevprime (LAST_PRIME ^ 2) */ In the light of the day this can be bumped slightly to prevprime(nextprime(LAST_PRIME)^2) - 1 = 316960 I'm not sure. I mean: with that list of primes we can certify primality up to that limit, I agree. But what about the exit condition? The current code consider that a number is prime if(n/prime LAST_PRIME^2, we should add one more branch in the loop. + /* Technically this should be prev_prime(LAST_PRIME ^ 2) */ I'd remove this, It's covered by the comment above Right. + if (q < prime) +return t; + if (r == 0) +break; Can this be rearranged to + if (r == 0) + break; + if (q <= prime) + return t; I fear it can't. The case t=3, prime=3 would consider 3 a composite. Moreover I believe that, on some architectures, q can be ready before r is, so the order Torbjorn used should reduce latency. But I may be wrong on that point. The case t=3 can be healed with an if(t<9) return t; just before the loop. Then we should measure speed on different platforms. With the order you propose, we may use NP_SMALL_LIMIT = prevprime (LAST_PRIME * (LAST_PRIME + 1)) right? + ASSERT (i < NUMBER_OF_PRIMES); Should this be placed at the start of the loop or be? + ASSERT (i+1 < NUMBER_OF_PRIMES); The ASSERT on i make sense just before i is used by the instruction prime+=primegap[i], i.e. at the end of the loop. Isn't it? If writing the code is not too complex, it may be interesting to test if it is worth. I'll try it out tonight Great! Did you try tweaking with the -p parameter? It can be useful when a measure is "noisy". Nope I had not, using -p3 seems to work just as well. I would have used something like -p10 or -p100, but if -p3 is good for you, I'm happy :-) Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mpz_prevprime
Il 2020-03-24 01:10 Seth Troisi ha scritto: (I'm don't think your ulong patch is needed but I can measure at a It's a small patch, but the speed-up it gives is probably too small to be worth adding. My code doesn't use a sieve (and instead checks each number for divisors similar to the old code) because the average gap for small numbers is small 30 / primepi(30) = 11 There is a small error in the code, it does not handle negative numbers correctly. For each divisor your code uses two operations, a product (prime*prime) and a remainder (t % prime). On many architectures computing t/prime and t%prime cost just one operation... that's why I'd use the ideas Torbjorn used in isprime() in dumbmp.c, https://gmplib.org/repo/gmp/rev/7524222ab7d4 currently in bootstrap.c . I propose a variation of your patch, you find it attached, on my computer it's faster. Because the measured threshold for this was much larger (~80 million), it might actually make sense to use a sieve after some threshold If you think that's a 1.5-3x speedup for inputs 16-30bits is important I can try to dynamically generate a longer list of primes If writing the code is not too complex, it may be interesting to test if it is worth. On Mon, Mar 23, 2020 at 2:38 PM Marco Bodrato wrote: It might also be useful to commit tune_nextprime_small which dynamically selects a higher reps count for mpz_nextprime input and helps increase precision. Uhm... j = 20 / s->size; This is a great point, I modified the code so It only changes the outer loop count. I included a screenshot showing how much this reduces variance over multiple runs. Did you try tweaking with the -p parameter? It can be useful when a measure is "noisy". PS: a comment in the current code says /* Larger threshold is faster but takes ~O(n/ln(n)) memory. To be honest, the array sieve needs n/24 bytes, and the array You are correct, let's change the comment by which I mean you :) Done. Ĝis, m -- http://bodrato.it/papers/diff -r a7836288d2c0 mpz/nextprime.c --- a/mpz/nextprime.c Fri Mar 20 16:19:07 2020 +0100 +++ b/mpz/nextprime.c Tue Mar 24 17:24:41 2020 +0100 @@ -85,6 +85,9 @@ }; #define NUMBER_OF_PRIMES 100 +#define LAST_PRIME 557 +/* NP_SMALL_LIMIT = prevprime (LAST_PRIME ^ 2) */ +#define NP_SMALL_LIMIT 310243 static unsigned long calculate_sievelimit(mp_bitcnt_t nbits) { @@ -120,6 +123,32 @@ return sieve_limit; } +static unsigned +mpz_nextprime_small (unsigned t) +{ + ASSERT (t > 0); /* Expect t=1 if the operand was smaller.*/ + /* Technically this should be prev_prime(LAST_PRIME ^ 2) */ + ASSERT (t < NP_SMALL_LIMIT); + + /* Start from next candidate (2 or odd) */ + t = (t + 1) | (t > 1); + for (; ; t += 2) +{ + unsigned prime = 3; + for (int i = 0; ; prime += primegap_small[i++]) + { + unsigned q, r; + q = t / prime; + r = t - q * prime; /* r = t % prime; */ + if (q < prime) + return t; + if (r == 0) + break; + ASSERT (i < NUMBER_OF_PRIMES); + } +} +} + void mpz_nextprime (mpz_ptr p, mpz_srcptr n) { @@ -132,18 +161,16 @@ unsigned odds_in_composite_sieve; TMP_DECL; - /* First handle tiny numbers */ - if (mpz_cmp_ui (n, 2) < 0) + /* First handle small numbers */ + if (mpz_cmp_ui (n, NP_SMALL_LIMIT) < 0) { - mpz_set_ui (p, 2); + ASSERT (NP_SMALL_LIMIT < UINT_MAX); + mpz_set_ui (p, mpz_nextprime_small (SIZ (n) > 0 ? mpz_get_ui (n) : 1)); return; } mpz_add_ui (p, n, 1); mpz_setbit (p, 0); - if (mpz_cmp_ui (p, 7) <= 0) -return; - TMP_MARK; pn = SIZ(p); MPN_SIZEINBASE_2EXP(nbits, PTR(p), pn, 1); ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mpz_prevprime
Il 2020-03-23 22:38 Marco Bodrato ha scritto: Using unsigned long to compute the modulus is obviously faster than many calls to mpz_cdiv_ui. I tried a patch that surely is not as fast as yours, but should speed-up all numbers that fits in a single unsigned long. The speed-up seems small, I'm not sure it's worth. ... it is not very interesting, but I attach the patch I wrote. Ĝis, mdiff -r a7836288d2c0 mpz/nextprime.c --- a/mpz/nextprime.c Fri Mar 20 16:19:07 2020 +0100 +++ b/mpz/nextprime.c Mon Mar 23 22:39:43 2020 +0100 @@ -202,6 +202,29 @@ memset (composite, 0, odds_in_composite_sieve); prime = 3; + if (mpz_fits_ulong_p (p)) + { + unsigned long p0 = mpz_get_ui (p); + + for (i = 0; i < prime_limit; i++) + { + m = p0 % prime; + /* Only care about odd multiplies of prime. */ + if (m & 1) + m = (prime - m) >> 1; + else + { + composite[0] |= m == 0; + m = prime - (m >> 1); + } + + /* Mark off any composites in sieve */ + for (; m < odds_in_composite_sieve; m += prime) + composite[m] = 1; + prime += primegap[i]; + } + } + else for (i = 0; i < prime_limit; i++) { m = mpz_cdiv_ui(p, prime); ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mpz_prevprime
Ciao, Il 2020-03-22 10:00 Seth Troisi ha scritto: I measure the regression as ~5-15% for input < ~10,000 And no regression for larger input? Good. I have a proposed patch which uses int, and trial div up to some threshold this appears to be 5x faster. A couple of question arises. 5x faster than what? Than the patch I proposed or than the old/current code? Which one is the good idea? using int or avoiding the sieve? Using unsigned long to compute the modulus is obviously faster than many calls to mpz_cdiv_ui. I tried a patch that surely is not as fast as yours, but should speed-up all numbers that fits in a single unsigned long. The speed-up seems small, I'm not sure it's worth. The cut off point as I measure it is ~80,000,000 but requires increasing the primegap_small list to ~1100 entries I'm not sure what you think about a list of that length. Well, the cut off point is surely different on different architectures. It should be tuned. Moreover: a large list ... can't it be generated on the fly? If we want a larger table, in my opinion it should be an improvement for all the functions that need primes (_primorial, _bin, _fac...). It might also be useful to commit tune_nextprime_small which dynamically selects a higher reps count for mpz_nextprime input and helps increase precision. Uhm... j = 20 / s->size; This means that for size=10 you compute 2 primes: all primes between 2^10 and 2^18. for size=11: all primes between 2^11 and 2^18. for size=12... and so on up to 2^18. This is not exactly representative of the speed of nextprime on operands around 2^10, 2^11... and so on. PS: a comment in the current code says /* Larger threshold is faster but takes ~O(n/ln(n)) memory. To be honest, the array sieve needs n/24 bytes, and the array primegap_tmp uses primepi(n) ~= n/ln(n) bytes. And asymptotically the sum of both is O(n). Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mpz_prevprime
Il 2020-03-21 11:42 Seth Troisi ha scritto: I see this was pushed, with a few more polishing tweaks. Yes, but there are bad news, it introduces a speed regression for small numbers. I see you also added more testing in tests/devel/primes.c which is a great triple check. It looks like the usage on line 21-28 / 399 wasn't updated. Not yet updated, you are right. I also added a target for tune/speed. tune/speed -s mpz_nextprime_1. measure the time needed to compute the first primes larger than , and divides by . If on my environment I try $ tune/speed -s1-4000 -t2 mpz_nextprime_1.16 before, and after the patch, I see that nextprime on small numbers is now slower. Before introducing new functions, may I suggest to try to speed up the function also for small numbers? You measured how much it is worth sieving before using the Miller-Rabin (actually BPSW) test. But for really small numbers, the sieve alone can be enough and the expensive test can be totally skipped. I attach a possible patch for this purpose. Ĝis, mdiff -r a7836288d2c0 mpz/nextprime.c --- a/mpz/nextprime.c Fri Mar 20 16:19:07 2020 +0100 +++ b/mpz/nextprime.c Sat Mar 21 15:16:27 2020 +0100 @@ -85,6 +85,11 @@ }; #define NUMBER_OF_PRIMES 100 +#define LAST_PRIME 557 +/* SIEVE_ONLY_GAP = maximal gap between primes < LAST_PRIME^2, + We are between 155921+86=156007 and 360653+96=360749 . +*/ +#define SIEVE_ONLY_GAP 86 static unsigned long calculate_sievelimit(mp_bitcnt_t nbits) { @@ -130,6 +135,7 @@ mp_bitcnt_t nbits; int i, m; unsigned odds_in_composite_sieve; + int sieve_only; TMP_DECL; /* First handle tiny numbers */ @@ -141,10 +147,37 @@ mpz_add_ui (p, n, 1); mpz_setbit (p, 0); - if (mpz_cmp_ui (p, 7) <= 0) -return; + TMP_MARK; - TMP_MARK; + if (mpz_cmp_ui (p, LAST_PRIME*LAST_PRIME - SIEVE_ONLY_GAP) <= 0) +{ + unsigned long pl, cp; + int i; + primegap = primegap_small; + + pl = mpz_get_ui (p); + cp = 3; + i = 0; + if (mpz_cmp_ui (p, LAST_PRIME) <= 0) + { + for ( ; cp < pl; cp += primegap[i++]) + ; + mpz_set_ui (p, cp); + return; + } + + odds_in_composite_sieve = SIEVE_ONLY_GAP / 2; + pl += SIEVE_ONLY_GAP - 2; + do { + cp += primegap[i++]; + } while (cp*cp < pl); + + prime_limit = i; + sieve_only = 1; +} + else +{ +/* TODO: Reindent the following code. */ pn = SIZ(p); MPN_SIZEINBASE_2EXP(nbits, PTR(p), pn, 1); if (nbits / 2 <= NUMBER_OF_PRIMES) @@ -192,6 +225,9 @@ /* Corresponds to a merit 14 prime_gap, which is rare. */ odds_in_composite_sieve = 5 * nbits; + sieve_only = 0; +} + /* composite[2*i] stores if p+2*i is a known composite */ composite = TMP_SALLOC_TYPE (odds_in_composite_sieve, char); @@ -226,7 +262,7 @@ difference = 0; /* Miller-Rabin test */ - if (mpz_millerrabin (p, 25)) + if (sieve_only || mpz_millerrabin (p, 25)) { TMP_FREE; return; ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mpz_prevprime
Ciao, Il 2020-03-16 02:23 Seth Troisi ha scritto: On Sun, Mar 15, 2020 at 4:38 PM Marco Bodrato wrote: May I write a few more comments? Always, my opinion about being ready is just that it's passed the bar of being good enough not that it's perfect. I'm tempted to simply commit your proposal and then change it :-) But if we keep on discussing here, we can improving the code with the opinions of two persons... I see now: + if (nbits <= 32) + incr_limit = 336; Probably, we should change the name of the variable to odd_candidates_in_sieve, then you would probably have written: if (nbits <= 32) odd_numbers_in_sieve = 168; Renamed to odds_in_composite_sieve Of course the name is not essential, the values are :-) And they seem correct to me, now. Then I wonder if the cost of the /* Sieve next segment */ step [...] need to allocate the possibly huge next_mult array? You're correct, this reduces the memory use 80% Moreover the code has now the following structure: SIEVE; for (;;) { CHECK_AND_POSSIBLY_BREAK; SIEVE; } There is an unneeded code duplication, it should be: for (;;) { SIEVE; CHECK_AND_POSSIBLY_BREAK; } If you need to store larger gaps, you can also save in the array gap>>1, because they all are even. The prime limit will be able to go beyond 2^37... Updated comment for how this could be done in the future. The first gap larger than 500, if I'm not wrong, is 304599508537+514=304599509051 I'd expect a condition like + if (nbits <= numberof (primegap_small) * 2 + 1) + { + primegap = primegap_small; + prime_limit = nbits / 2; + } Done I read: + if (2 * nbits <= NUMBER_OF_PRIMES) +{ + primegap = primegap_small; + prime_limit = nbits / 2; +} But ... this means that at most one fourth of the constants in primegap_small are used, am I wrong? Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Moving LOOP_ON_SIEVE_* macros to gmp-impl.h
Ciao, Il 2020-03-16 04:36 Seth Troisi ha scritto: Per Marco's comments in my prev_prime/next_prime patch I moved some of the primesieve macros and functions to gmp-impl.h There are two reasons why I never moved those functions and macros to gmp-impl.h, two aspects of one problem: the interface is not clean. First: I'm not sure I like macros that open a loop and don't close it... Second: n_to_bit() is not injective, obviously. E.g. n_to_bit(7) = 1, n_to_bit(10) = 1 . This is not a problem when its value is used for n_to_bit (), but generates confusion if n_to_bit() is used on a value that is not carefully chosen... The first, is maybe just an opinion of mine, do you think those macros are reasonably clean? The second, should be healed somehow. The easier way probably is to write two different functions: static mp_limb_t n_to_bit_end (mp_limb_t n) { return ((n-5)|1)/3U; } static mp_limb_t n_to_bit_start (mp_limb_t n) { return (n-2-n%2)/3U; } bit_to_n (renamed sieve_bit_to_n) id_to_n (renamed sieve_id_to_n) n_to_bit (renamed sieve_n_to_bit) Renaming is a good idea, IMO. This allows the four (soon to be five) identical copies in bin_uiui, oddfac_1, primorial_ui, and tests/devel/primes to be cleaned up. Uhm I have (slightly) changed the macros. Not the interface. It's not clear where this should be documented, if someone tells Niels' answer is perfect. Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mpz_prevprime
Ciao, Il 2020-03-15 00:07 Seth Troisi ha scritto: New patch which is cleaned up and IMO ready to commit May I write a few more comments? On Mon, Mar 9, 2020 at 5:15 PM Seth Troisi wrote: I also dislike the boiler plate of the macros but I didn't Those macros should probably be moved to gmp-impl.h, to we avoid duplication. For this to be possible, we should avoid having different versions of the same macros in the different files... When nbits is small (IMHO the most common case) we'd like to be able to fallback to a constant array which I don't think exist for primesieve (because gmp_limb_bits isn't constant). Only the first "limb" of the sieve is hard-coded in primesieve.c, but it is possible to generate a larger "seed" at configure time, if we want. Why not using a variable for INCR_LIMIT? I see now: + if (nbits <= 32) +incr_limit = 336; + else if (nbits <= 64) +incr_limit = 1550; + else +/* Corresponds to a merit 10 prime_gap, which is rare. */ +incr_limit = 7 * nbits / 2; Probably, we should change the name of the variable to odd_candidates_in_sieve, then you would probably have written: if (nbits <= 32) odd_numbers_in_sieve = 168; else if (nbits <= 64) odd_numbers_in_sieve = 775; else ... Then I wonder if the cost of the /* Sieve next segment */ step is so high that we really have to try hard to avoid it. And if we are able to reduce the probability of its use to very unlikely, do we then really need to allocate the possibly huge next_mult array? + mpz_root (tmp, tmp, 2); I'd use mpz_sqrt. + // TODO: break the rare gap larger than this into gaps <= 255 If you need to store larger gaps, you can also save in the array gap>>1, because they all are even. The prime limit will be able to go beyond 2^37... + /* Avoid having a primegap > 255, first occurence 436,273,009. */ + ASSERT( 4000 < sieve_limit && sieve_limit < 43600 ); Why do you need to check (4000 < sieve_limit)? I'd expect a condition like + if (nbits <= numberof (primegap_small) * 2 + 1) +{ + primegap = primegap_small; + prime_limit = nbits / 2; +} so that all primegap_small can be used. If you think this threshold is too high, then primegap_small can be reduced... Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: [PATCH] mini-gmp: pass correct old_size to custom reallocate function
Ciao, Il Lun, 9 Marzo 2020 12:30 pm, Niels Möller ha scritto: > Marco Bodrato writes: >> For the style, I do not know which one is better: >> "if(c){v=val}else{v=0};" or "v=0;if(c){v=val};" > I don't have a very strong opinion on this point, but when it's easy to > do, I prefer to assign once on each code path. Agreed. Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: [PATCH] mini-gmp: pass correct old_size to custom reallocate function
Ciao, Il Lun, 9 Marzo 2020 2:08 pm, Niels Möller ha scritto: > Marco Bodrato writes: > >> -gmp_free_func (rden, 0); >> +gmp_free_func (rden, lden * sizeof (char)); > > sizeof (char) == 1 by definition, so no need to multiply with it. Of course you are right! Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mpz_prevprime
Ciao, Il 2020-03-09 11:59 Seth Troisi ha scritto: On Mon, Mar 9, 2020 at 2:05 AM Marco Bodrato wrote: Ciao, Il 2020-03-09 02:56 Seth Troisi ha scritto: It's highly possible I misunderstand how these macros are supposed to be used. I also dislike the boiler plate of the macros but I didn't like When I say "that code is messy", I mean my code. And you agree :-) those macros are just easy to misunderstand :-/ If you have suggestions or code pointers for a better pattern I'd appreciate that. Did you try to use the gmp_primesieve function directly? I'm not sure what you're referencing here, I tried with I'm proposing to get rid of primegap, and use something like the following. This is just a copy-paste from your code, not a really working sequence! + __sieve = TMP_ALLOC_LIMBS (primesieve_size (sieve_limit)); + prime_limit_pre = gmp_primesieve(__sieve, sieve_limit); + next_mult = TMP_ALLOC_TYPE (prime_limit, unsigned int); [...] + /* Invert p for modulo */ + mpz_neg(p, p); [..handle 3 outside the loop] +LOOP_ON_SIEVE_BEGIN (prime, n_to_bit (5), n_to_bit (sieve_limit), 0, __sieve); + m = mpz_fdiv_ui(p, prime); + /* Only care about odd multiplies (even indexes) */ + if (m & 1) +m += prime; + + /* mark off any composites in sieve */ + for (; m < INCR_LIMIT; m += 2*prime) +sieve[m] = 1; + next_mult[i] = m; +LOOP_ON_SIEVE_END; + mpz_neg(p, p); [...] + /* Sieve next segment */ [..handle 3] + memset(sieve, 0, INCR_LIMIT); + LOOP_ON_SIEVE_BEGIN (prime, n_to_bit (5), n_to_bit (sieve_limit), 0, __sieve); + m = next_mult[i] - INCR_LIMIT; + for (; m < INCR_LIMIT; m += prime * 2) +sieve[m] = 1; + next_mult[i] = m; + LOOP_ON_SIEVE_END; Is there a way to unalloc a TMP_ALLOC_LIMBS before TMP_FREE? No, if you want to unalloc, you must use __GMP_ALLOCATE_FUNC_TYPE, __GMP_REALLOCATE_FUNC_TYPE, and __GMP_FREE_FUNC_TYPE. Sieve next segment is rare, the average gap is ln(n), if INCR_LIMIT was changed to be a variable it could be set so /*sieve next segment*/ happened on average less than once, and then Why not using a variable for INCR_LIMIT? In the array char *sieve, you use only the even entries. Maybe you can reduce its size by half, and remove some "*2" around the code? it's only 4000 entries so I wasn't bothering at this time, but it I agree. PS: did you consider mpz_Cdiv_ui, instead of _neg,_Fdiv_ui,_neg ? Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: [PATCH] mini-gmp: pass correct old_size to custom reallocate function
Ciao, Il 2020-03-07 21:27 minux ha scritto: All comments addressed, except as noted below. I also fixed similar issues in mini-mpq.c changes. On Sat, Mar 7, 2020 at 2:26 PM Niels Möller wrote: minux writes: I'm not that familiar with the mpq functions. I hope Marco can comment. I had problems with my e-mail adress, but I'm here again :-) Maybe there are some unneeded initialisations = NULL, but details can also be refined later... There is an undocumented "feature" that the proposed patch breaks: "safe" failure when the base is out of range. The question is, should we keep it or not, document it or not, should GMP and mini-gmp agree? I attach a possible patch for tests that check that feature. > @@ -4199,7 +4208,7 @@ > - size_t i, sn; > + size_t i, sn, osn = 0; > @@ -4220,15 +4229,15 @@ >if (!sp) > -sp = (char *) gmp_xalloc (1 + sn); > +sp = (char *) gmp_xalloc (osn = 1 + sn); I'd prefer if (!sp) { osn = 1 + sn; sp = (char *) gmp_xalloc (osn); } else osn = 0; (and drop the zero initialization at the declaration above). There are many other places where we initialize a variable with a "default" and we conditionally change the value... I agree with the separate line for "osn = 1 + sn". For the style, I do not know which one is better: "if(c){v=val}else{v=0};" or "v=0;if(c){v=val};" Ĝis, mdiff -r c44538397385 mini-gmp/tests/t-mpq_str.c --- a/mini-gmp/tests/t-mpq_str.c Wed Feb 12 20:48:35 2020 +0100 +++ b/mini-gmp/tests/t-mpq_str.c Mon Mar 09 10:07:10 2020 +0100 @@ -1,6 +1,6 @@ /* -Copyright 2012-2014, 2016, 2018 Free Software Foundation, Inc. +Copyright 2012-2014, 2016, 2018, 2020 Free Software Foundation, Inc. This file is part of the GNU MP Library test suite. @@ -144,7 +144,7 @@ char *ap; char *bp; char *rp; - size_t rn, arn; + size_t rn; mpq_t a, b; @@ -173,7 +173,6 @@ } rn = strlen (rp); - arn = rn - (rp[0] == '-'); bp = mpq_get_str (NULL, (i&1 || base > 36) ? base: -base, a); if (strcmp (bp, rp)) @@ -246,6 +245,16 @@ free (rp); testfree (bp); } + + /* Check behaviour (undocuented, but conforming with GMP) when + base is out of range. */ + base = (i&1) ? 63: -37; + if (mpq_get_str (bp, base, a)) + { + fprintf (stderr, "mpz_get_str did not fail, as expected:\n"); + fprintf (stderr, " base = %d\n", base); + abort (); + } } mpq_clear (a); mpq_clear (b); diff -r c44538397385 tests/mpz/convert.c --- a/tests/mpz/convert.c Wed Feb 12 20:48:35 2020 +0100 +++ b/tests/mpz/convert.c Mon Mar 09 10:07:10 2020 +0100 @@ -103,11 +103,10 @@ mpz_urandomb (bs, rands, 32); bsi = mpz_get_ui (bs); - base = bsi % 62 + 1; - if (base == 1) - base = 0; + base = bsi % 62; + base += (base != 0); - str = mpz_get_str ((char *) 0, base, op1); + str = mpz_get_str ((char *) 0, (i&1 || base > 36) ? base: -base, op1); mpz_set_str_or_abort (op2, str, base); if (mpz_cmp (op1, op2)) @@ -122,6 +121,15 @@ (*__gmp_free_func) (str, strlen (str) + 1); + /* Check behaviour (undocuented, but safe) when + base is out of range. */ + if (mpz_get_str (str, (i&1) ? 63: -37, op1)) + { + fprintf (stderr, "ERROR, mpz_get_str did not return NULL for base out of range, in test %d\n", i); + fprintf (stderr, "base = %d\n", (i&1) ? 63: -37); + abort (); + } + /* 2. Generate random string and convert to mpz_t and back to a string again. */ mpz_urandomb (bs, rands, 32); ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mpz_prevprime
Ciao, Il 2020-03-09 02:56 Seth Troisi ha scritto: From Marco Bodrato's analysis I got inspired and modified nextprime to Your analysis is much deeper than mine :-) I do not have much time now... but I glanced at the code and write here a few comments. You can see some of my thresholds and testing input in this google sheet[1]. The end result is sieve up ~ LOG2(N) ^ (20/7) / 3268 Funny numbers :-) In the code, you compute LOG2(N) ^ (20/7) / 3268. (A comment says "/ 1880") Then, if the result is greater than 43000, you cut it. Cutting seems a good idea, but I'd do the reverse: if nbits is larger than (43000*3268) ^ (7/20) = 17940, set limit to 43000, compute otherwise. For 10,000 bit numbers this sieves up to 80 million (which takes ~4 seconds but reduces nextprime from ~300seconds to ~200 seconds) Seems interesting :-) I cleaned up the code without any of the debug printing in the 3rd patch nextprime_sieve_clean.patch which I'd love people to consider Some random comments: If you use TMP_DECL, then you don't need to use also TMP_SDECL. The same for TMP_MARK and TMP_SMARK; or TMP_FREE and TMP_SFREE. I'm happy to see that you used gmp_primesieve and the macros for it, but... ... maybe that code is too messy? I mean, my code with the LOOP_ON_SIEVE_ macros... You did not use that sieve to directly loop on primes, but you used it to loop on primes for building a primegap table, and then use that table to loop on primes... Did you try to use the gmp_primesieve function directly? If sieve_limit = 43000, you are allocating 17M for mp_limb_t *sieve, then another 22M for primegap_tmp, and both are kept in memory. I believe that looping with primegap is faster, but... how many times do we need too loop, on average? I mean, how often is the /*Sieve next segment*/ code used? In the array char *sieve, you use only the even entries. Maybe you can reduce its size by half, and remove some "*2" around the code? Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Overflow in mpz_cmp
Ciao, Il 2020-02-18 00:35 Guillaume Melquiond ha scritto: Le 17/02/2020 à 22:59, Marco Bodrato a écrit : Did you apply formal verification to other functions? Did they succeed? Here is the list of verified functions: mpn_powm mpn_sqrtrem Those are particularly interesting for me! I recently committed some special flavours of powm: special cases for base=2 or when the base is a single limb. It would be really interesting to check if they are formally correct. Moreover year ago, or maybe more, I wrote a possible variation for sqrt1. I never moved it to the main repository, because I did not have the time to analyse its correctness. Again, an instrument to check, would be nice. A bit more details can be found there: https://gitlab.inria.fr/why3/whymp I'll look into your work, thanks. Ĝis, m -- http://bodrato.it/papers/ ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Overflow in mpz_cmp
Ciao, Il 2020-02-13 08:54 Guillaume Melquiond ha scritto: Note that mpz_cmpabs does not have any overflow issue, since absolute sizes are subtracted there. So, except maybe for homogeneity with mpz_cmp, it can keep using the optimized version. I pushed a patch for both cmp and cmpabs. So that the second is ready for a change in the type of the field _mp_size for the mpz_t variables :-) But you are right, the patch to cmpabs is not needed for the current code. Il 2020-02-10 18:25 Guillaume Melquiond ha scritto: The bug was noticed when formally verifying this function. Did you apply formal verification to other functions? Did they succeed? Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mpz_prevprime
Ciao, Il Mer, 12 Febbraio 2020 7:26 am, Niels Möller ha scritto: > Marco Bodrato writes: >> Maybe my estimates are wrong. If they are not, the limit for trial >> division should be increased more than linearly on the size of the > And current code uses > prime_limit = nbits / 2; > Why the constant "/ 2"? No idea, this code seems to be written by > Torbjörn's and me a decade ago. Before the rewrite, the code was void mpz_nextprime (mpz_ptr p, mpz_srcptr t) { mpz_add_ui (p, t, 1L); while (! mpz_probab_prime_p (p, 5)) mpz_add_ui (p, p, 1L); } Which had the only benefit (from my personal point of view) of returning the next (possibly negative) probab_prime when a negative value was given as an input :-) The "/2" is one of the many constant that should be interesting to tune, but it's not easy to understand exactly how to, because we do not even really know if we need a multiplicative constant or some other curve. >> A possible error in my analysis: I assumed a "constant cost" for each >> new prime. That's true when updating the residues, but each new number >> in the initial loop on _tdiv_ui imply a cost that is linear on the >> bit-size of the number x. > And for a small prime gap (common case), this cost should be the main > part of the sieving. So it would make sense to optimize it. If we also > use the ptab, we could compute modulo single-limb prime products using > precomputed constants. Yes, of course. And the next question will be: should we generate on the fly an even larger table of primes when the required size grows beyond the precomputed table? Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mpz_prevprime
Ciao, Il 2020-02-10 21:01 ni...@lysator.liu.se ha scritto: On my laptop, it gives a speedup of about 25% for larger sizes. Not sure how to tune for small sizes, but clearly the old code clamping the size of the prime table based on the bit size is better than doing nothing. The computation of all the moduli could be speed up by using ptab, and by using precomputed reciprocals. Not clear to me how much the speed of that part matters. I haven't run any profiling. How often numbers pass the sieving if we use trial division up to primelimit? Approximately 1/(ln(primelimit)*e^g), where g is the Eulero-Mascheroni constant. Right? Assume, we had primelimit = e^n and we increase it to e^(n+1). The number of primes approximately was e^n/n, now it is e^(n+1)/(n+1). 1/(n*e^g) passed trial division, now only 1/((n+1)*e^g) pass it. For the numbers that did not pass trial division, nothing changes. For the numbers that did pass trial division, we add e^(n+1)/(n+1)-e^n/n tests = e^n*(e-1-1/n)/(n+1). Can we assume a constant cost for each one of this tests? But for one every (n+1) of them, we can avoid any further test (they are detected by the new trial divisions). Can we assume that the cost for the additional test basically is the cost of Miller-Rabin? Let's call this cost MR(x). So, increasing primelimit from e^n to e^(n+1), we pay e^n*(e-1-1/n)/(n+1) and we gain MR(x)/(n+1). It is worth doing if primelimit (e^n) is comparable with the cost MR(x). But the cost of Miller-Rabin is more than quadratic on the bit-size of the number x, right? Maybe my estimates are wrong. If they are not, the limit for trial division should be increased more than linearly on the size of the numbers that are tested. A possible error in my analysis: I assumed a "constant cost" for each new prime. That's true when updating the residues, but each new number in the initial loop on _tdiv_ui imply a cost that is linear on the bit-size of the number x. This observation gives again an almost linear relation between primelimit and the bit-size of the number... and says that the speed of the "update residues" loop is not so relevant, at least for large numbers; but the speed of the initial moduli probably IS relevant. Ĝis, m -- http://bodrato.it/papers/ ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Overflow in mpz_cmp
Ciao, Il 2020-02-11 14:56 Marc Glisse ha scritto: On Tue, 11 Feb 2020, Niels Möller wrote: if (usize != vsize) return (usize > vsize) ? 1 : -1; On x86_64, both gcc and clang optimize (usize > vsize) ? 1 : -1 to 2 * (usize > vsize) - 1 (as a single LEA for gcc, 2 ADD for llvm). So the generated code may be just as good with the simple code. We know, optimising is a complex task, and we are not writing a compiler here. But it is funnier to observe how the compilers translate the last line of mpz/cmp.c: return (usize >= 0 ? cmp : -cmp); in the branches where the compiler "knows" that cmp is zero :-) Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Overflow in mpz_cmp
Ciao, Il 2020-02-11 14:42 ni...@lysator.liu.se ha scritto: Marco Bodrato writes: diff -r f5601c2a8b11 mpz/cmp.c --- a/mpz/cmp.c Sun Feb 09 16:16:19 2020 +0100 +++ b/mpz/cmp.c Tue Feb 11 14:20:39 2020 +0100 @@ -35,15 +35,15 @@ + cmp = (usize > vsize) - (usize < vsize); + if (cmp != 0) +return cmp; I would be tempted to keep it simple, if (usize != vsize) return (usize > vsize) ? 1 : -1; It's not clear to me if this is worth micro optimizing, and ensure we get only a single branch. I believe that current compilers on modern architectures should compile the (usize > vsize)?1:-1 expression with no branches. I tested gcc on amd64 and on arm64, and on both archs your code is compiled with the single (usize != vsize) branch. Your proposal is simpler to read too. Ĝis, m -- http://bodrato.it/papers/ ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Overflow in mpz_cmp
Ciao, Il 2020-02-10 18:25 Guillaume Melquiond ha scritto: When the operand sizes do not match, the mpz_cmp function function just returns the difference of the signed sizes. Unfortunately, this difference might not fit inside the "int" return type, when the numbers are of opposite sign. In mini-gmp we defined a macro: #define GMP_CMP(a,b) (((a) > (b)) - ((a) < (b))) We may use the same idea here too. I mean something like the following: diff -r f5601c2a8b11 mpz/cmp.c --- a/mpz/cmp.c Sun Feb 09 16:16:19 2020 +0100 +++ b/mpz/cmp.c Tue Feb 11 14:20:39 2020 +0100 @@ -35,15 +35,15 @@ int mpz_cmp (mpz_srcptr u, mpz_srcptr v) __GMP_NOTHROW { - mp_size_t usize, vsize, dsize, asize; + mp_size_t usize, vsize, asize; mp_srcptr up, vp; intcmp; usize = SIZ(u); vsize = SIZ(v); - dsize = usize - vsize; - if (dsize != 0) -return dsize; + cmp = (usize > vsize) - (usize < vsize); + if (cmp != 0) +return cmp; asize = ABS (usize); up = PTR(u); diff -r f5601c2a8b11 mpz/cmpabs.c --- a/mpz/cmpabs.c Sun Feb 09 16:16:19 2020 +0100 +++ b/mpz/cmpabs.c Tue Feb 11 14:20:39 2020 +0100 @@ -36,15 +36,15 @@ int mpz_cmpabs (mpz_srcptr u, mpz_srcptr v) __GMP_NOTHROW { - mp_size_t usize, vsize, dsize; + mp_size_t usize, vsize; mp_srcptr up, vp; intcmp; usize = ABSIZ (u); vsize = ABSIZ (v); - dsize = usize - vsize; - if (dsize != 0) -return dsize; + cmp = (usize > vsize) - (usize < vsize); + if (cmp != 0) +return cmp; up = PTR(u); vp = PTR(v); Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mpz_prevprime
Ciao, Il 2020-01-30 21:12 Seth Troisi ha scritto: I've disabled both gap test but left the code is so that I can uncomment and run it when doing serious changes to the file. If the gap tests are basically equivalent to running many repetitions of test_ref, you can trigger it when the number of repetitions is larger than some adequately large ammount. By the way I pushed this version of your patch. https://gmplib.org/repo/gmp/rev/3507f80aee0a I was referencing my change for mpz_prevprime which is implemented via a macro shared with mpz_nextprime In my opinion, the loop in _{prev,next}prime is not really time-critical. So that... I'd like a different approach, not a macro. Moreover, I'm not the main developer of the function mpz_nextprime... I'll personally leave to the other developers the task of revising the patch and possibly integrate it with the current library. Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mpz_prevprime
Ciao, Il 2020-02-05 00:59 Seth Troisi ha scritto: I got VERY VERY VERY lucky and found a prime gap > 2^15 with the 4th highest merit of ANY KNOW PRIME GAP. Great! Given Marco's timing of 25 seconds (and a goal of say 3 seconds on that machine) the start prime would need to be ~~200 digits. t...@gmplib.org (Torbjörn Granlund) writes: We have tried to stick to a limit of 1 s on a reasonable current computer. Most tests should use much less, if possible. I underline one word TG used: reasonable. Mine was not :-) I was only measuring how much the patch increases the time used by the test. But I'm personally still not really understanding which feature is tested by the patched code and not tested by the current code. Ĝis, mCiao, Il 2020-02-05 00:59 Seth Troisi ha scritto: I got VERY VERY VERY lucky and found a prime gap > 2^15 with the 4th highest merit of ANY KNOW PRIME GAP. Great! Given Marco's timing of 25 seconds (and a goal of say 3 seconds on that machine) the start prime would need to be ~~200 digits. t...@gmplib.org (Torbjörn Granlund) writes: We have tried to stick to a limit of 1 s on a reasonable current computer. Most tests should use much less, if possible. I underline one word TG used: reasonable. Mine was not :-) I was only measuring how much the patch increases the time used by the test. But I'm personally still not really understanding which feature is tested by the patched code and not tested by the current code. Ĝis, m ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel