Re: question on fixing the mpz_sizeinbase()

2024-03-17 Thread marco . bodrato
Ciao,

I answer here to a question from gmp-discuss.

17 mar 2024, 01:28 da t...@gmplib.org:

> "website.reader"  writes:
>
>  Here's the suggested fix for this command in a C++ code unit
>
This question arises once every couple of years, more or less...
Should we write a new mpz_sizeinbase_exactbutpossiblymuchslower() function?

Ideas:
 - use the vlog[] table we already have in mpn/generic/rootrem.c to compute 
some "decimal digits" of the logarithm base 2 of the number to more exactly 
discriminate within the "digits" and the "digits+1" case based on the higher 
bits of the number. This keeps on being O(1) but should narrow the range of 
cases where we need to
( - compute a sort of ui_powhi_ui  (we will have a use for mulhi :-), or 
finally)
 - compute ui_pow_ui and fully compare, giving the exact result.

Do you feel it may be useful? It would be more clever than what the user can 
simply do themself.
Ĝis,
mb
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Has there been historical work done to investigate small integer optimization?

2024-02-12 Thread marco . bodrato
Ciao,

12 feb 2024, 11:54 da paul.zimmerm...@inria.fr:

> as Niels said, I fear the cost of the tests might make the benefit vanish.
> But to be sure, the only way is to try to implement this idea inside GMP,
> which involves quite some work.
>
But implementing it with the current mpz type is "impossible".
I mean, one should break the current interface.
Currently, _mp_d is always a pointer AND it always point to a readable limb. 
Even if _mp_alloc is zero.

To implement the idea inside GMP one should choose:
 - break current interface, or
 - fully implement a new layer (not mpn, mpz, mpf, but... mpZ?).

Ĝis,
mb
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: mpn_mulhigh_basecase for Broadwell

2024-02-11 Thread marco . bodrato
Thank you Albin,

currently we don't have any mulhi function, so we didn't decide a possible 
interface for it.
There are some comments in the code where a full mul is used, but some "mulhi" 
function could be enough. Many of them seem to use unbalanced operands, and 
might require the exact result or more limbs than the higher half...

By the way, I like the idea of computing one more limb, that way one may know 
whether the limbs obtained by mulhi are exact or not.

Ĝis,
mb

6 feb 2024, 23:27 da albin.ahlb...@gmail.com:

> Hello,
>
> I just wrote an implementation for mpn_mulhigh_basecase for Broadwell-type 
> processors (that is, x86_64 with BMI2 and ADX instructions) based on 
> Torbjörn's `mpn_mullo_basecase'.
>
> It is currently declared on the form
>
>  mp_limb_t flint_mpn_mulhigh_basecase(mp_ptr rp, mp_srcptr xp, mp_srcptr yp, 
> mp_size_t n),
>
> as it was developed for FLINT (Fast Library for Number Theory). Note that 
> `rp' cannot be aliased with `xp' or `yp'.
>
> It will compute an approximation of the upper most `n + 1' limbs of `{xp, n}' 
> times `{yp, n}', where the upper most `n' limbs are pushed to `rp[0]', ..., 
> `rp[n - 1]' and the least significant computed limb is returned (via %rax). 
> This returned limb should have an error of something along `n' ULPs.
>
> Note that this differs from MPFR's (private) function `mpfr_mulhigh_n', which 
> computes the approximation of the upper most `n' limbs into `rp[n]', ..., 
> `rp[2 * n - 1]', where `rp[n]' has an error of something along `n' ULPs at 
> most.
>
> Feel free to change it according to your needs (perhaps you do not want to 
> compute `n + 1' limbs, but rather `n' limbs).
>
> If this code will be used in GMP, feel free to remove the copyright claim for 
> FLINT and put my name (spelled Albin Ahlbäck) in the GMP copyright claim 
> instead.
>
> Just some notes:
>
> - We use our own M4 syntax for the beginning and ending of the function, but 
> it should be easy to translate to GMP's syntax.
> - It currently only works for n > 5 (I believe) as we in FLINT have 
> specialized routines for small n.
> - It would be nice to avoid pushing five register, and only push four.
> - Reduce the size of the `L(end)' code, and try to avoid multiple jumps 
> therein.
> - Move the code-snippet of `L(f2)' to just above `L(b2)', so that no jump is 
> needed in between. (This currently does not work because `L(end)' as well as 
> this code-snippet is too large for a relative 8-bit jump.)
> - Start out with an mul_1 sequence with just a mulx+add+adc chain, just like 
> in `mpn_mullo_basecase'.
> - Remove the first multiplication in each `L(fX)' and put it in `L(end)' 
> instead.
> - The `adcx' instruction in `L(fX)' can be removed (then one needs to adjust 
> the `L(bX)'-label), but I found it to be slower. Can we remove it and somehow 
> maintain the same performance?
>
> Best,
> Albin
>

___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Tuneup using size_ratio

2023-12-27 Thread marco . bodrato
Ciao,

23 dic 2023, 21:02 da ni...@lysator.liu.se:

> marco.bodr...@tutanota.com writes:
>
>> But if I measure the speed with the ratio 0.7 or 0.8 I get quite
>> different thresholds, which one should we use?
>>
> Maybe crossover is rather flat, i.e., a wide range with very small
> difference in performance?
>

Actually, looking at the colored bi-dimensional maps we still have under the 
title "New Toom multiplication code for unbalanced operands" on page 
https://gmplib.org/devel/ , I believe that the line representing the thresholds 
are more complex than a straight un=constant or vn=constant. (un+vn is more 
appropriated,  when dealing with FFT, but non necessarily good on other border 
lines).

I would like to reproduce those maps with the current toom6h and toom8h, to see 
what changed.


> I hope these changes should make it easier to
> investigate such things.
>

I believe that those changes are interesting! The threshold system is complex 
for the unbalanced operations, anyway. Those changes suggest us to mark more 
clearly on the names of the thresholds which line was used to measure it.
I mean, if it's probably quite obvious that the threshold toom42-toom63 is 
measured on a 2x1 ratio line, it may not be as obvious for other thresholds.
Moreover, on the 2x1 line, we have the MUL_TOOM42_TO_TOOM63_THRESHOLD, but we 
are not measuring the TOOM63_TO_TOOM6h and TOOM6h_TO_TOOM8h on that line. 
Should we use those functions?

> I intend to push this change in a few days. Have a nice Christmas!
>
May it be useful to add also the "/r" option for mpn_nussbaumer_mul?

And now, have a happy new year!

Ĝis!
mb
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: [PATCH] Fix typo in z13 bdiv_dbm1c.

2023-10-12 Thread marco . bodrato
Ciao,

10 ott 2023, 09:14 da t...@gmplib.org:

> Stefan Liebler  writes:
>
>  In my case, before calling mpn_divexact_by3c v6 was {0x1, 0x0} and thus 
>


> OK.  We should clearly have tests/*call.asm and tests/*check.c for
> s390_64 and s390_32.  (But the check file might want to re-randomise the
> magic register values after checking them.)
>

Another approach, simpler in my opinion, is to write a test checking the return 
value of the function mpn_divexact_by3c. The manual documents that the "return 
value c satisfy c*b^n + a-i = 3*q, where...", exactly, so we should test the 
return value.

In the library we have some ASSERTs, checking if the return value is zero or 
not, but I believe we do not have any place where the correct return value is 
really checked.

Does the attached test detect the suspicious code?

Ĝis,
m
/* Test for mulmod_bknp1 function.

   Contributed to the GNU project by Marco Bodrato.

Copyright 2023 Free Software Foundation, Inc.

This file is part of the GNU MP Library test suite.

The GNU MP Library test suite is free software; you can redistribute it
and/or modify it under the terms of the GNU General Public License as
published by the Free Software Foundation; either version 3 of the License,
or (at your option) any later version.

The GNU MP Library test suite is distributed in the hope that it will be
useful, but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General
Public License for more details.

You should have received a copy of the GNU General Public License along with
the GNU MP Library test suite.  If not, see https://www.gnu.org/licenses/.  */


#include 
#include 

#include "gmp-impl.h"
#include "tests.h"

/* Sizes are up to 2^SIZE_LOG limbs */
#ifndef SIZE_LOG
#define SIZE_LOG 7
#endif

#ifndef COUNT
#define COUNT 5000
#endif

#define MAX_N (1 << SIZE_LOG)
#define MIN_N 1

int
main (int argc, char **argv)
{
  mp_ptr np, n3p, rp;
  int count = COUNT;
  int test;
  gmp_randstate_ptr rands;
  TMP_DECL;
  TMP_MARK;

  TESTS_REPS (count, argv, argc);

  tests_start ();
  rands = RANDS;

  np = TMP_ALLOC_LIMBS (MAX_N);
  n3p = TMP_ALLOC_LIMBS (MAX_N + 1);
  rp = 1 + TMP_ALLOC_LIMBS (MAX_N + 1);

  for (test = 0; test < count; test++)
{
  unsigned size_min;
  unsigned size_range;
  mp_size_t n;
  mp_limb_t r_before, r_after;
  mp_limb_t res, expected;

  for (size_min = 1; (1L << size_min) < MIN_N; size_min++)
	;

  /* We generate n in the MIN_N <= n <= (1 << size_range). */
  size_range = size_min
	+ gmp_urandomm_ui (rands, SIZE_LOG + 1 - size_min);

  n = MIN_N
	+ gmp_urandomm_ui (rands, (1L << size_range) + 1 - MIN_N);

  mpn_random (np, n);

  mpn_random2 (rp - 1, n + 2);
  r_before = rp[-1];
  r_after = rp[n + 1];

  expected = mpn_mul_1 (n3p, np, n, 3);
  if ((n & 1) == 0)
	{
	  res = mpn_divexact_by3 (rp, n3p, n / 2);
	  res = mpn_divexact_by3c (rp + n / 2, n3p + n / 2, n / 2, res);
	}
  else
	res = mpn_divexact_by3c (rp, n3p, n, 0);

  if (rp[-1] != r_before || rp[n + 1] != r_after
	  || mpn_cmp (rp, np, n) != 0 || res != expected)
	{
	  printf ("ERROR in test %d, n = %d\n",
		  test, (int) n);
	  if (rp[-1] != r_before)
	{
	  printf ("before rp:"); mpn_dump (rp - 1, 1);
	  printf ("keep:   "); mpn_dump (_before, 1);
	}
	  if (rp[n + 1] != r_after)
	{
	  printf ("after rp:"); mpn_dump (rp + n + 1, 1);
	  printf ("keep:   "); mpn_dump (_after, 1);
	}
	  if (res != expected)
	{
	  printf ("retval  :"); mpn_dump (, 1);
	  printf ("expected:"); mpn_dump (, 1);
	}
	  mpn_dump (np, n);
	  mpn_dump (n3p, n);
	  mpn_dump (rp, n);

	  abort();
	}
  ASSERT_ALWAYS (res < 3);
}
  TMP_FREE;
  tests_end ();
  return 0;
}
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


gcc -fanalyzer

2023-09-16 Thread marco . bodrato
Ciao,

gcc complains about missing call to 'va_end' in scanf/doscan.c and 
printf/doprnt.c .

man stdarg on my Debian reads: "Each  invocation of va_copy() must be matched 
by a corresponding invocation of va_end() in the same function."
So that probably gcc is right. I assume we never noticed, because probably 
va_end is a noop,
But, we may want to correct it anyway.

Correcting scanf/doscan.c is quite easy, I'd suggest:

~/src/gmp$ hg diff scanf/doscan.c
diff -r 8225bdfc499f scanf/doscan.c
--- a/scanf/doscan.c    Tue Sep 05 18:32:26 2023 +0200
+++ b/scanf/doscan.c    Sat Sep 16 18:40:28 2023 +0200
@@ -761,6 +761,7 @@
 }
 
  done:
+  va_end (ap);
   (*__gmp_free_func) (alloc_fmt, alloc_fmt_size);
   return fields;
}
~/src/gmp$

On the printf/doprnt.c side, I'm not sure.
There is a "va_copy (ap" at the beginning, so we can va_end before returning.
Each loop starts with a "va_copy (this_ap" and can end with a corresponding 
va_end.
But last_ap is va_copy-ed again and again; should we insert a va_end before 
each new copy?

I attach a possible patch for printf/doprnt.c .

Ĝis,
m
diff -r 8225bdfc499f printf/doprnt.c
--- a/printf/doprnt.c	Tue Sep 05 18:32:26 2023 +0200
+++ b/printf/doprnt.c	Sat Sep 16 18:59:51 2023 +0200
@@ -340,6 +340,7 @@
 		ret = __gmp_doprnt_integer (funs, data, , gmp_str);
 		 __GMP_FREE_FUNC_TYPE (gmp_str, strlen(gmp_str)+1, char);
 		DOPRNT_ACCUMULATE (ret);
+		va_end (last_ap);
 		va_copy (last_ap, ap);
 		last_fmt = fmt;
 	  }
@@ -372,6 +373,7 @@
 	  DOPRNT_ACCUMULATE (__gmp_doprnt_mpf (funs, data, ,
 		   GMP_DECIMAL_POINT,
 		   va_arg (ap, mpf_srcptr)));
+	  va_end (last_ap);
 	  va_copy (last_ap, ap);
 	  last_fmt = fmt;
 	  break;
@@ -494,6 +496,7 @@
 	  case 'Z':  mpz_set_si ((mpz_ptr) p, (long) retval); break;
 	  }
 	}
+	va_end (last_ap);
 	va_copy (last_ap, ap);
 	last_fmt = fmt;
 	goto next;
@@ -604,7 +607,7 @@
 
 next:
   /* Stop parsing the current "%" format, look for a new one. */
-  ;
+  va_end (this_ap);
 }
 
   TRACE (printf ("remainder: \"%s\"\n", last_fmt));
@@ -616,6 +619,8 @@
   goto error;
 
  done:
+  va_end (last_ap);
+  va_end (ap);
   __GMP_FREE_FUNC_TYPE (alloc_fmt, alloc_fmt_size, char);
   return retval;
 
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Fast constant-time gcd computation and modular inversion

2022-09-03 Thread Marco Bodrato

Il 2022-09-01 17:04 Torbjörn Granlund ha scritto:

/* FIXME: Using mpz_invert is cheating. Instead, first compute m' =
   m^-1 (mod 2^k) via Newton/Hensel. We can then get the inverse 
via


   2^{-k} (mod m) = (2^k - m') * m + 1)/2^k. */
mpz_invert (t, t, m);
mpn_copyi (info->ip, mpz_limbs_read (t), mpz_size (t));

You might want to use mpn_binvert here.


We should start writing mpn_sec_binvert :-)
Or use the current mpn_sec_invert for the precomputation.

Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: More accurate calculation of sieve array size

2022-08-11 Thread Marco Bodrato

Ciao Adrien,

Il 2022-08-11 17:43 Adrien Prost-Boucle ha scritto:

For primorial, when thinking on the code I considered that the call to
  mpz_prodlimbs (x, factors, j)
is the one responsible for (re)sizing x to whatever appropriate for the 
result.


Indeed, prodlimbs can resize the mpz_t to the needed size.


(and I think I got confused with the reuse of name x in nested loop)


This was a (stylistic) error of mine.
https://gmplib.org/repo/gmp/rev/d337756619a0


If sizing x correctly before the call mpz_prodlimbs() garantees that
mpz_prodlimbs()
will not resize it, then the current implementation is efficient 
indeed.


That's the aim of the code, avoid resizing. I had also another goal:
in the future maybe move the computations to the mpn level...

Does the code success in avoiding resize? But maybe it always allocate a
too large area (that's a waste too)?

The code reads:

 size = n / GMP_NUMB_BITS;
 size = size + (size >> 1) + 1;

This means that we reserve an area for a result ≤ 2^(n*3/2) ≤ (2.829)^n, 
right?


Wikipedia [ https://en.wikipedia.org/wiki/Primorial ] says:

Using more advanced methods, Rosser and Schoenfeld showed that n# ≤ 
(2.763)^n

and in Theorem 4, formula 3.14, showed that for n≥563, n# ≥ (2.22)^n

Does our code make sense, is it tight enough?


This could have deserved a bit of comment ;-)


If we are sure that the trick works... yes, we should write something 
:-)


Ĝis,
m

--
http://bodrato.it/papers/
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: More accurate calculation of sieve array size

2022-08-11 Thread Marco Bodrato

Ciao Adrien,

Il 2022-08-10 15:37 Adrien Prost-Boucle ha scritto:

Looking into oddfac and primorial computation code,
I think there is a suboptimality in the computation of the prime sieve 
size.

Could anyone confirm?


Maybe there are some too-conservative assumptions, but we must be 
careful with

the code.


--- mpz/oddfac_1.old2022-08-10 15:14:04.511298108 +0200
+++ mpz/oddfac_1.c  2022-08-10 15:31:34.721354647 +0200
@@ -381,8 +381,8 @@
  TMP_MARK;

  flag--;
- size = n / GMP_NUMB_BITS + 4;
- ASSERT (primesieve_size (n - 1) <= size - (size / 2 + 1));
+ size = (n / 3 / GMP_NUMB_BITS + 1) * 2;
+ ASSERT (primesieve_size (n - 1) <= size / 2);
  /* 2-multiswing(n) < 2^(n-1)*sqrt(n/pi) < 2^(n+GMP_NUMB_BITS);
 one more can be overwritten by mul, another for the sieve */
  MPZ_TMP_INIT (mswing, size);


Here, mswing is not just used for the sieve, it will contain the result
of the computation of the "2-multiswing.factorial of n". A portion of
the same allocated space is used for the sieve (the ASSERT checks, is
the size needed for the sieve less than half the size we allocate? Yes,
ok we can go on...).

The comment says that /* 2-multiswing(n) < 2^(n+GMP_NUMB_BITS) */, then
it makes sense to allocate n / GMP_NUMB_BITS + 4 limbs.

Surely this is an over-estimate, and maybe we can estimate it better.
But to reduce the memory usage, we have to estimate the maximal size
of the value that will be stored in mswing, not the sieve only.


--- mpz/primorial_ui.c.old  2022-08-10 14:44:04.437867901 +0200
+++ mpz/primorial_ui.c  2022-08-10 14:55:41.721238758 +0200
@@ -82,8 +82,7 @@
   mp_limb_t prod;
   TMP_DECL;

-  size = n / GMP_NUMB_BITS;
-  size = size + (size >> 1) + 1;
+  size = n / 3 / GMP_NUMB_BITS + 1;
   ASSERT (size >= primesieve_size (n));
   sieve = MPZ_NEWALLOC (x, size);
   size = (gmp_primesieve (sieve, n) + 1) / log_n_max (n) + 1;


The same here. sieve is stored at the same address as x, but we need
to estimate the size of the value that x will store, not the sieve.

x will store the result, primorial (n). The question is: is the
allocated size reasonable to store the result x=primorial(n)?
The other question, "Can it also contain the sieve?", is secondary.

Ĝis,
m

--
http://bodrato.it/papers/
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Use of AVX instructions in mpn_mul_1

2022-06-17 Thread Marco Bodrato

Ciao Thanassis,

Il 2022-06-13 23:17 Thanassis Tsiodras ha scritto:

I had a quick look at the x86_64 assembly implementations of the basic
primitive used in multiplications (mpn_mul_1), and saw this:



...I could not find any use of AVX-integer-related multiplication
instructions.
I am talking about things like " _mm512_mul_epu32", which at first 
glance

seemed promising (8x32bit multiplications in one instruction generating
8x64-bit results in one go).


Four 32x32->64 multiplications perform the same multiplication work of 
one 64x64->128. But are "8x32bit multiplications in one instruction" 
faster then two 64x64 mul? As you confirm, many other additions with 
carry propagation are needed.


So the question is, does using AVX reduce the resources needed for a 
multiplication?



I can't see a way to do that optimally. Is that the reason GMP asm code
seems to prefer the simple 64x64 => 128 instructions?  (mul %rcx)


When you'll find an implementation with AVX, more efficient than our 
current implementation, you can contribute it to the project :-)


Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Squaring optimization in mpz_addmul/mpz_submul

2022-05-29 Thread Marco Bodrato

Ciao Fredrik,

Il 2022-05-25 22:48 Fredrik Johansson ha scritto:

Now that mpn_mul no longer dispatches to mpn_sqr, the squaring
optimization is missing in mpz_addmul/mpz_submul.


Thanks for spotting this!


mpz_addmul(sum, a, a);



This should be easy to fix.


It is. I fixed it with a small patch.
https://gmplib.org/repo/gmp/rev/61ef108d740c

But I had to also add that case to the test program.
The case mpz_addmul(sum, a, a); was correctly handled,
but it wasn't tested.

And the case mpz_addmul(x, x, x); keeps on being untested!

Ĝis,
m

--
http://bodrato.it/papers/
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: stronglucas.c

2022-05-29 Thread Marco Bodrato

Ciao,

Il 2022-05-18 19:32 Marco Bodrato ha scritto:

Il Mer, 18 Maggio 2022 7:48 am, Niels Möller ha scritto:

Seth Troisi  writes:

I was reading the stronglucas code


Great!


-if (((*PTR (n) & 6) == 0) && UNLIKELY (mpz_perfect_square_p 
(n)))


Our test-suite did not trigger that branch.

Now it does: https://gmplib.org/repo/gmp/rev/7ecb57e1bb82

I took the list of base-2 pseudo-primes by Jan Feitsma, published at 
http://www.cecm.sfu.ca/Pseudoprimes/index-2-to-64.html .
Trowing the numbers in the list through the current implementation, it 
was easy to find examples triggering the different branches!


Very useful!


the whole "((*PTR (n) & 6) == 0) &&" code is an optimization,

[...]

Should we avoid to repeat the check here and call the
function?


Maybe we can directly call the function...

On the other side, maybe we should avoid some calls to jacobi...

The current search for an odd D such that the Jacobi symbol (n/|D|) is 
negative is performed by the following code:


  D = GMP_NUMB_BITS % 16 == 0 ? (GMP_NUMB_BITS % 32 == 0 ? 17 : 15) : 5;

  do
{
  if (UNLIKELY (D >= maxD))
return 1;
  D += 2;
  jac = mpz_oddjacobi_ui (n, D);
}
  while (jac == 1);


If we already checked 15, we may skip all the composite D. Should we at 
least use a code that skips the multiple of 3? Something like:


  unsigned Ddiff = 2;
#if GMP_NUMB_BITS % 16 == 0
  const unsigned D2 = 6;
#if GMP_NUMB_BITS % 32 == 0
  D = 19;
  Ddiff = 4;
#else
  D = 17;
#endif
#else
  const unsigned D2 = 4;
  D = 7;
#endif

  for (;;)
{
  jac = mpz_oddjacobi_ui (n, D);
  if (jac != 1)
break;
  if (UNLIKELY (D >= maxD))
return 1;
  D += Ddiff;
  Ddiff = D2 - Ddiff;
}

In the common case (GMP_NUMB_BITS % 32 == 0), I'd expect that about 50% 
of the numbers entering stronglucas will use D=5 (Fibonacci), 25% D=7, 
1/8 D=11, 1/16 D=13, 1/32 D=15, 1/64 D=17, so that only 1/64 should use 
this code. I'd expect that D=19 will be used for one half of them, 
but... should we check D=21 for the other half?


Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: stronglucas.c

2022-05-18 Thread Marco Bodrato
Ciao,

Il Mer, 18 Maggio 2022 7:48 am, Niels Möller ha scritto:
> Seth Troisi  writes:
>> I was reading the stronglucas code

Great!

>> diff -r 970b7221873f mpz/stronglucas.c

>>  /* n is odd, to possibly be a square, n % 8 = 1 is needed. */
>> -if (((*PTR (n) & 6) == 0) && UNLIKELY (mpz_perfect_square_p (n)))
>> +if (((*PTR (n) & 7) == 1) && UNLIKELY (mpz_perfect_square_p (n)))
>>return 0; /* A square is composite. */

>> Technically n/x should be odd per the function comment but IMO this
>> improves readability by having the "n % 8 = 1" in the nearer comment
>> match the code.

> I think the way it's currently written is a micro optimization, testing

> That said, I don't think this code is that performance critical, so
> saving a single instruction doesn't matter much. I agree your change
> improves readability a bit.

Shall we remove a (micro) optimization, or shall we improve the comment?

Anyway, the whole "((*PTR (n) & 6) == 0) &&" code is only an optimization,
the first check done by the function mpz_perfect_square_p is based on the
lowest bits. Should we avoid to repeat the check here and simply call the
function?

>> It's also possible lines 122-124 should also be indented
>
> Indentation looks right to me, at the current place. But perhaps more
> consistent with nearby code if comments are moved below the "else if"
> line (and then indented accordingly for that location).

line 122
  /* (2^24 - 1) | (2^{GMP_NUMB_BITS*3/4} - 1)   */
is indented as line 97;
  /* (2^12 - 1) | (2^{GMP_NUMB_BITS*3/4} - 1)   */
123 as 99.
Those comments should explain what happens with the given GMP_NUMB_BITS
values.

But I agree, there should also be a comment after the "else if",
explaining how the check for D=17 works.



There are also other comments that should be improved.
The comment to the function (line 80) reads:
/* Requires GCD (x,6) = 1.*/

Inside the code included by
#if GMP_NUMB_BITS % 16 == 0
we have (line 100)
  ASSERT (g % 3 != 0 && g % 5 != 0 && g % 7 != 0);

but I fear that the code actually assumes also
 g % 11 != 0 && g % 13 != 0 && g % 17 != 0

Not a problem, with the current use in the library (this code is called
after some trial division check), but it is more difficult to read the
code if the comments are not correct.

Ĝis,
m

___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: about prime number generation

2022-05-14 Thread Marco Bodrato

Il 2022-05-01 17:01 Marco Bodrato ha scritto:

Il 2022-04-30 23:39 Seth Troisi ha scritto:

next_prime took up all of my energy


We should further improve that work!

With your improvements to nextprime, it's really a waste of resources
if ones have to write
  for (int i = 0; i < delta2; i++)
mpz_nextprime(p, p);
because every sieved range is used just once, instead of scanning it to 
the end.


I mean, we should add a couple of (public or not?) functions:
 mpz_nth_{prec,next}prime


I wrote a simple mpz_nth_nextprime, and here are some timings:

$ build/tune/speed -p1 -rs 10-60 -t5 mpz_nextprime 
mpz_nth_nextprime.2 mpz_nth_nextprime.4 mpz_nth_nextprime.16 
mpz_nth_nextprime.256
overhead 0.7 secs, precision 1 units of 8.56e-10 secs, 
CPU freq 1167.94 MHz

   mpz_nextprime nth_nxtprim.2 nth_nxtprim.4 nth_nxtprm.16 nth_nxtp.256
10  #0.011412.43846.1353   42.1803   4318.7026
15  #0.025872.07954.4340   23.0275   1946.8527
20  #0.254721.82043.4797   13.7392237.0212
25  #0.326771.88303.6122   14.1018225.8131
30  #0.406901.89053.6741   14.3550230.1069
35  #0.535111.82853.5060   13.4946215.2175
40  #0.606371.84213.5508   13.6446217.1993
45  #0.700471.82803.4869   13.5221217.8561
50  #0.781571.81003.5152   13.6302216.4944
55  #0.946821.90493.6773   14.4213231.4945
60  #0.0001060161.86423.5877   14.0455231.0008

This means that, the 256th next prime of a large enough number (more 
than 20 bits) is found faster than using _nextprime 256 times.
For smaller numbers this is not evident, for two reasons: current code 
use different strategies for numbers smaller than 20 bits, and the 256th 
next prime of a 10bit number is a 12bit prime, i.e. a much larger 
number.


But I realize that this is not a clever approach.
My code actually finds all the primes and skips them, but this is not 
very useful. The need to have a way to loop on primes emerges again.


Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Use on small mulmod_bnm1 [was: New mulmod_bknp1]

2022-05-01 Thread Marco Bodrato
Ciao Niels,

for some reason I did not receive your answer. I took the text from the
archive.

Il gio, 21 aprile 2022 08:15 am, Niels Möller ha scritto:
> Marco Bodrato  writes:
>> Yes, with a larger expected gain for 3, and a smaller one for 13, or 17.
>
> And in your table, almost all use 3, and none use 7, 13 or 17.

>>  size -> measured time
>>  1080 -> 6.906e-05   (+  72, +12%) 2^3*3^3*5
>>  1104 -> 7.294e-05   (+  24, +5.6%) 2^4*3*23
>
> So 1104 wins over the power of 2, 1024 = 2^10.

Yes, it does. Both on shell and on my laptop, even if in both cases
MUL_FFT_MODF_THRESHOLD is smaller than 512.

>>  1584 -> 0.0001159   (+  72, +4.2%) 2^4*3^2*11
>>  1600 -> 0.0001217   (+  16, +5.1%) 2^6*5^2
>
> The only example in the list using 5 as a factor.

There are some more in the full list...

>>  1984 -> 0.0001648   (+  40, +3.1%) 2^6*31
>
> First example not using any of the odd numbers in the list, so not using
> mulmod_bknp1 at all.

Yes, starting from here, the power of 2 seems dominant.

>>  2048 -> 0.0001676   (+  64, +1.7%) 2^11
>
> I wonder if this would be beaten by 2064 = 2^4*3*43, or if this power
> of two really is a winner.

It is a winner, I measured a much larger range.

> It's also interesting that all these winners use 2^k with k >= 3, so
> third split in mulmod_bnm1 seems to pay off measurably.

On that machine, in that range.

> So just rounding up to a multiple of 24 = 2^3 * 3 might be a reasonable
> initial strategy (above some threshold of a likely few hundred limbs).

The reasonable gaps between the sizes probably are: +2, +4, +8, +12, +24,
+48, then powers of two.

On shell, something like:
+2 up to 36
+4 up to 108
+12 up to 264
+24 up to 936
and up to 1944... +24 again? Even if we have many +72 (preserving a factor
3^2)...
Then, powers of 2.

Ĝis,
m

___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: about prime number generation

2022-05-01 Thread Marco Bodrato

Ciao Seth,

Il 2022-04-30 23:39 Seth Troisi ha scritto:

I worked on nth_prime(n) in a series of patches three years ago


That code was very interesting.
It would be nice to get rid of the double type and the function log(). 
The library is not using math.h and libm anywhere now. But it doesn't 
seem simple, at least for me.


We should start pushing a primepi(N) function, counting the primes 
between 1 and N.

I have a function, which is almost ready, with the following prototype:
long gmp_pi_ui (unsigned long n)

But to have a function that really belongs to GMP, we should extend it 
to at least mpz_pi (mpz_t n) ... With a reasonable return type ...



Sadly this never made it upstream as the review process for a faster
next_prime took up all of my energy


We should further improve that work!

With your improvements to nextprime, it's really a waste of resources if 
ones have to write

  for (int i = 0; i < delta2; i++)
mpz_nextprime(p, p);
because every sieved range is used just once, instead of scanning it to 
the end.


I mean, we should add a couple of (public or not?) functions:
 mpz_nth_{prec,next}prime


In practice I'd recommend you also take a look at the
highly optimized code from kimwalisch
https://github.com/kimwalisch/primecount


Of course a specialized library will remain faster than some small piece 
of code in GMP...


Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Use on small mulmod_bnm1 [was: New mulmod_bknp1]

2022-04-20 Thread Marco Bodrato

Ciao,

Il 2022-04-19 21:03 ni...@lysator.liu.se ha scritto:

Marco Bodrato  writes:


In the 128-2048 range (at least on that machine: shell.gmplib.org) the
sizes multiple of 12, 24, 48... should be preferred...


I don't fully understand this, but if I got it right, we want the size 
n
to be divisible by 2^k, for mulmod_bnm1 to be able to split a few 
times.

But we don't need a very large k, since we have diminishing returns for
each split.

And for the new mulmod_bknp1 to fit, we also need n to be divisible by
one of certain small odd numbers, currently 3, 5, 7, 13, 17.


Yes, with a larger expected gain for 3, and a smaller one for 13, or 17.


I think it would make sense to first select k (maybe a constant, or
growing very slowly with n, which might ask for a tuned table), say 2 
<=

k <= 5 for the size range of interest. And then round n upwards to the
closest multiple of one of 2^k * {3, 5, 7, 13, 17}. Hmm, or maybe to
make it more complex, one of 2^{k,k-1} * {3, 5, 7, 13, 17}, it that
let's us avoid growth. It would be nice if we could find a set of
candidates that guarantees that we don't have to increase size more
than, say, 10%, but not sure if that's possible.


It should be possible to not increase too much.

The following is a list of the best sizes wrt time spent in mulmod_bnm1.
Extracted in the range 1024..2048.
I'm not sure the program I wrote really shows what is needed.
Nevertheless, if the list contains 1008 and 1080, this means that for 
every size in the range (1080..1080) the time for a multiplication using 
that size is larger than the time spent by the last point in the 
interval.


 size -> measured time
 1008 -> 6.156e-05   (+  72, +8.3%) 2^4*3^2*7
 1080 -> 6.906e-05   (+  72, +12%) 2^3*3^3*5
 1104 -> 7.294e-05   (+  24, +5.6%) 2^4*3*23
 1128 -> 7.686e-05   (+  24, +5.4%) 2^3*3*47
 1200 -> 7.986e-05   (+  72, +3.9%) 2^4*3*5^2
 1224 -> 8.28e-05(+  24, +3.7%) 2^3*3^2*17
 1296 -> 8.602e-05   (+  72, +3.9%) 2^4*3^4
 1320 -> 9.437e-05   (+  24, +9.7%) 2^3*3*5*11
 1368 -> 9.824e-05   (+  48, +4.1%) 2^3*3^2*19
 1392 -> 0.0001022   (+  24, + 4%) 2^4*3*29
 1416 -> 0.0001087   (+  24, +6.4%) 2^3*3*59
 1512 -> 0.0001112   (+  96, +2.3%) 2^3*3^3*7
 1584 -> 0.0001159   (+  72, +4.2%) 2^4*3^2*11
 1600 -> 0.0001217   (+  16, +5.1%) 2^6*5^2
 1680 -> 0.0001273   (+  80, +4.6%) 2^4*3*5*7
 1704 -> 0.0001396   (+  24, +9.7%) 2^3*3*71
 1728 -> 0.00014 (+  24, +0.23%) 2^6*3^3
 1776 -> 0.0001434   (+  48, +2.4%) 2^4*3*37
 1800 -> 0.0001439   (+  24, +0.35%) 2^3*3^2*5^2
 1872 -> 0.0001463   (+  72, +1.7%) 2^4*3^2*13
 1920 -> 0.000158(+  48, + 8%) 2^7*3*5
 1944 -> 0.0001598   (+  24, +1.2%) 2^3*3^5
 1984 -> 0.0001648   (+  40, +3.1%) 2^6*31
 2048 -> 0.0001676   (+  64, +1.7%) 2^11

Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Use on large mul_fft [was: New mulmod_bknp1]

2022-04-16 Thread Marco Bodrato

Il 2022-04-15 19:56 Marco Bodrato ha scritto:

Ciao,


Il 2022-02-15 11:48 Marco Bodrato ha scritto:
I pushed some new functions to compute the product (and square) 
modulo

B^nn+1, for small values of the exponent nn.


Currently that code is used by two functions.
One is mulmod_bnm1,


The second one is mul_fft, thanks to the patch named 
"mpn/generic/mul_fft.c: Use _bknp1," [...], available at

https://gmplib.org/repo/gmp/rev/f9cbcda05f7e

I attach an image (fakte+eble_18327.png), showing the relative speed of 
the

code after/before the patch.

The green line represents the gain with the current code. The orange one 
is estimated, using not the actual time of the function for a given 
size, but the best timings among the usable sizes.


Even if also for this range we see the effect "interesting improvement 
for some sizes, no news for some other sizes", the graph is not so 
"noisy".


The two lines (green, and orange) show more or less the same shape. 
Working on a better criteria to choose the mulmod size for large sizes 
is probably not worth doing.

Maybe something can be done on the FFT side? I didn't try.

Ĝis,
m___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: mul_fft, cleaning up some details of the code

2022-03-08 Thread Marco Bodrato

Ciao Paul,

Il 2022-03-08 16:20 Paul Zimmermann ha scritto:

Uhm, the last line of code just before that ones is:

   cc = mpn_sub_1 (r, r, m, cc) + 1;
   /* add 1 to cc instead of rd since rd might overflow */



it seems you are right!


Well, I pushed a clean-up for that portion of the code too...

https://gmplib.org/repo/gmp/rev/84893dca3e6c

Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: mul_fft, cleaning up some details of the code

2022-03-08 Thread Marco Bodrato

Ciao Paul,

Il 2022-03-08 12:56 Paul Zimmermann ha scritto:
Since you deeply know how this code works, I ask you one more 
question.

The last lines of the function mpn_fft_mul_2exp_modF (branch m < n)
contain:

   /* now subtract cc and rd from r[m..n] */

   r[n] = -mpn_sub_1 (r + m, r + m, n - m, cc);
   r[n] -= mpn_sub_1 (r + m, r + m, n - m, rd);
   if (r[n] & GMP_LIMB_HIGHBIT)
 r[n] = mpn_add_1 (r, r, n, CNST_LIMB(1));

This code assumes that there can only be a single borrow. Is it 
correct?



yes there can only be a single borrow, since since cc is 0 or 1,


Uhm, the last line of code just before that ones is:

  cc = mpn_sub_1 (r, r, m, cc) + 1;
  /* add 1 to cc instead of rd since rd might overflow */

So that I'd say that cc is 1 or 2.
Then the case cc=2, m=n-1, r[m]=0, and rd=GMP_NUMB_MAX seems very very 
unlikely, but possible.



if r[n] is non zero after r[n] = -mpn_sub_1 (r + m, r + m, n - m, cc),
this can only happen for cc=1 and {r+m,n-m} = 000...000 before the
mpn_sub_1 call, and {r+m,n-m} = 111...111 after, then the second
mpn_sub_1 call cannot produce a borrow.


Of course a second borrow may only happen if n-m=1,
i.e. if {r+m,n-m} contains a single limb.

Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: mul_fft, cleaning up some details of the code

2022-03-08 Thread Marco Bodrato

Ciao Paul,

Il 2022-03-07 10:28 Paul Zimmermann ha scritto:

Date: Sun, 06 Mar 2022 11:14:49 +0100
From: Marco Bodrato 



Specifically I'd focus into a suspicious piece of code, shared by both
our current code and Vivien's.



if (cy >= 0)
  cy = mpn_add_1 (tmp, tmp, Kl, cy);
else
  cy = mpn_sub_1 (tmp, tmp, Kl, -cy);



(resp. added) at tmp[0]. Here are some comments added to the code (from
GMP 6.2.1) and a fix:



if (cy >= 0)
  cy = mpn_add_1 (tmp, tmp, Kl, cy);
+/* cy is now the carry at tmp[Kl] */
else
+{
  cy = mpn_sub_1 (tmp, tmp, Kl, -cy);
+  /* cy is now the borrow at tmp[Kl] */
+  if (cy)
+cy = mpn_add_1 (tmp, tmp, Kl, cy);
+  /* cy is now the carry at tmp[Kl] */
+}


Your fix of course is correct, that's the full normalization for the 
else branch, but for that branch only.


The patch I pushed yesterday introduce an initial

tmp[Kl] = 0;

and then uses INCR_U or DECR_U

if (cy >= 0)
  MPN_INCR_U (tmp, Kl + 1, cy);
else
  {
tmp[Kl] = 1;
MPN_DECR_U (tmp, Kl + 1, -cy - 1);
  }

The resulting value in tmp may be almost any value with Kl limbs and an 
additional high 0 or 1.




Since you deeply know how this code works, I ask you one more question.
The last lines of the function mpn_fft_mul_2exp_modF (branch m < n) 
contain:


  /* now subtract cc and rd from r[m..n] */

  r[n] = -mpn_sub_1 (r + m, r + m, n - m, cc);
  r[n] -= mpn_sub_1 (r + m, r + m, n - m, rd);
  if (r[n] & GMP_LIMB_HIGHBIT)
r[n] = mpn_add_1 (r, r, n, CNST_LIMB(1));

This code assumes that there can only be a single borrow. Is it correct?
I'm going to change them into the following:

  r[n] = 2;
  MPN_DECR_U (r + m, n - m + 1, cc);
  MPN_DECR_U (r + m, n - m + 1, rd);
  if (UNLIKELY ((r[n] -= 2) != 0))
{
  mp_limb_t cy = -r[n];
  /* cy should always be 1, but we did not check if the case
 m=n-1, r[m]=0, cc+rd>GMP_NUMB_MAX+1, and then cy = 2,
 is actually possible. */
  r[n] = 0;
  MPN_INCR_U (r, n + 1, cy);
}

If we really can assume that cc+rd <= GMP_NUMB_MAX+1, the code could be 
simpler:


  r[n] = 1;
  MPN_DECR_U (r + m, n - m + 1, cc);
  MPN_DECR_U (r + m, n - m + 1, rd);
  if (UNLIKELY (r[n] == 0))
MPN_INCR_U (r, n + 1, 1);
  else
r[n] = 0;

Ĝis,
m

--
http://bodrato.it/papers/
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: mul_fft, cleaning up some details of the code

2022-03-06 Thread Marco Bodrato

Ciao,

Il 2022-03-06 12:16 Torbjörn Granlund ha scritto:

Marco Bodrato  writes:



  The comment before the mpn_mul_fft_decompose function says "We must
  have nl <= 2*K*l", this means that we should never have ((dif = nl -
  Kl) > Kl), and the code in that branch should never be used.



I noticed that at some point too, and have had a patch lying around for
quite a while:


You basically propose to remove the branch.
I disabled it when we don't WANT_OLD_FFT_FULL, and corrected it anyway.


(Not sure about the separate FIXME change therein; it looks strange to
me now.)


I'm not sure I understand it...


Feel free to clean this up.


Pushed a patch.

Ĝis,
m

--
http://bodrato.it/papers/
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


mul_fft, cleaning up some details of the code

2022-03-06 Thread Marco Bodrato

Ciao,

I looked into the code published by Samuel Vivien.
I realised that in mul_fft there are a lot of _add_1 and _sub_1. At 
least some of them can be easily replaced by _INCR_U or _DECR_U...


Specifically I'd focus into a suspicious piece of code, shared by both 
our current code and Vivien's.


The function mpn_mul_fft_decompose has a branch "if (dif > Kl)", that
ends with the following lines:

  if (cy >= 0)
cy = mpn_add_1 (tmp, tmp, Kl, cy);
  else
cy = mpn_sub_1 (tmp, tmp, Kl, -cy);

Can those lines be correct? Can the value in cy be used after this code 
both if it contains the carry or if it contains the borrow of the 
operation on tmp?


I believe this is an error, I'd change the last line into
cy = 1 - mpn_sub_1 (tmp, tmp, Kl, -cy - 1);
and I propose the attached patch changing this and some other _1 
functions.


This is not really needed to solve a bug.
The comment before the mpn_mul_fft_decompose function says "We must have 
nl <= 2*K*l", this means that we should never have ((dif = nl - Kl) > 
Kl), and the code in that branch should never be used.
According to the coverage analisys, the code is not explored by the 
tests:

https://gmplib.org/devel/lcov/shell/var/tmp/lcov/gmp/mpn/mul_fft.c.gcov#691
The "bug" is never triggered.

Maybe the branch could be used if someone re-enables mpn_mul_fft_full 
and uses it for very unbalanced (more than 4x1) operands?


Ĝis,
mdiff -r 6ff25532f2a4 mpn/generic/mul_fft.c
--- a/mpn/generic/mul_fft.c	Fri Mar 04 10:03:38 2022 +0100
+++ b/mpn/generic/mul_fft.c	Sun Mar 06 09:41:58 2022 +0100
@@ -237,14 +237,14 @@
 
   r[n] = 0;
   /* cc < 2^sh <= 2^(GMP_NUMB_BITS-1) thus no overflow here */
-  cc++;
-  mpn_incr_u (r, cc);
+  ++cc;
+  MPN_INCR_U (r, n + 1, cc);
 
-  rd++;
+  ++rd;
   /* rd might overflow when sh=GMP_NUMB_BITS-1 */
-  cc = (rd == 0) ? 1 : rd;
+  cc = rd + (rd == 0);
   r = r + m + (rd == 0);
-  mpn_incr_u (r, cc);
+  MPN_INCR_U (r, n + 1 - m - (rd == 0), cc);
 }
   else
 {
@@ -284,7 +284,10 @@
   r[n] = -mpn_sub_1 (r + m, r + m, n - m, cc);
   r[n] -= mpn_sub_1 (r + m, r + m, n - m, rd);
   if (r[n] & GMP_LIMB_HIGHBIT)
-	r[n] = mpn_add_1 (r, r, n, CNST_LIMB(1));
+	{
+	  r[n] = 0;
+	  MPN_INCR_U (r, n + 1, CNST_LIMB(1));
+	}
 }
 }
 
@@ -560,7 +563,9 @@
 	  */
 	  tp[0] += cc;
 	}
-	  a[n] = mpn_sub_n (a, tp, tpn, n) && mpn_add_1 (a, a, n, CNST_LIMB(1));
+	  cc = mpn_sub_n (a, tp, tpn, n);
+	  a[n] = 0;
+	  MPN_INCR_U (a, n + 1, cc);
 	}
 }
   TMP_FREE;
@@ -657,8 +662,7 @@
 }
 
   /* remains to subtract {ap+n, l} from {rp, n+1} */
-  cc = mpn_sub_n (rp, rp, ap + n, l);
-  rpn -= mpn_sub_1 (rp + l, rp + l, n - l, cc);
+  rpn -= mpn_sub (rp, rp, n, ap + n, l);
   if (rpn < 0) /* necessarily rpn = -1 */
 rpn = mpn_add_1 (rp, rp, n, CNST_LIMB(1));
   return rpn;
@@ -688,7 +692,10 @@
 
   tmp = TMP_BALLOC_LIMBS(Kl + 1);
 
-  if (dif > Kl)
+  tmp[Kl] = 0;
+  /* The comment "We must have nl <= 2*K*l." says that
+	 ((dif = nl - Kl) > Kl) should never happen. */
+  if (UNLIKELY (dif > Kl))
 	{
 	  int subp = 0;
 
@@ -713,16 +720,18 @@
 	  else
 	cy -= mpn_add (tmp, tmp, Kl, n, dif);
 	  if (cy >= 0)
-	cy = mpn_add_1 (tmp, tmp, Kl, cy);
+	MPN_INCR_U (tmp, Kl + 1, cy);
 	  else
-	cy = mpn_sub_1 (tmp, tmp, Kl, -cy);
+	{
+	  tmp[Kl] = 1;
+	  MPN_DECR_U (tmp, Kl + 1, -cy - 1);
+	}
 	}
   else /* dif <= Kl, i.e. nl <= 2 * Kl */
 	{
 	  cy = mpn_sub (tmp, n, Kl, n + Kl, dif);
-	  cy = mpn_add_1 (tmp, tmp, Kl, cy);
+	  MPN_INCR_U (tmp, Kl + 1, cy);
 	}
-  tmp[Kl] = cy;
   nl = Kl + 1;
   n = tmp;
 }
@@ -755,7 +764,7 @@
 
 static mp_limb_t
 mpn_mul_fft_internal (mp_ptr op, mp_size_t pl, int k,
-		  mp_ptr *Ap, mp_ptr *Bp, mp_ptr A, mp_ptr B,
+		  mp_ptr *Ap, mp_ptr *Bp, mp_ptr unusedA, mp_ptr B,
 		  mp_size_t nprime, mp_size_t l, mp_size_t Mp,
 		  int **fft_l, mp_ptr T, int sqr)
 {
@@ -797,9 +806,7 @@
 
   j = (K - i) & (K - 1);
 
-  if (mpn_add_n (n, n, Bp[j], nprime + 1))
-	cc += mpn_add_1 (n + nprime + 1, n + nprime + 1,
-			  pla - sh - nprime - 1, CNST_LIMB(1));
+  cc += mpn_add (n, n, pla - sh, Bp[j], nprime + 1);
   T[2 * l] = i + 1; /* T = (i + 1)*2^(2*M) */
   if (mpn_cmp (Bp[j], T, nprime + 1) > 0)
 	{ /* subtract 2^N'+1 */
@@ -825,8 +832,7 @@
 	}
   else
 	{
-	  cc = mpn_sub_1 (p + pla - pl, p + pla - pl, pl, cc);
-	  ASSERT (cc == 0);
+	  MPN_DECR_U (p + pla - pl, pl, cc);
 	}
 }
   else
@@ -918,18 +924,17 @@
 
   A = TMP_BALLOC_LIMBS (K * (nprime + 1));
   Ap = TMP_BALLOC_MP_PTRS (K);
+  Bp = TMP_BALLOC_MP_PTRS (K);
   mpn_mul_fft_decompose (A, Ap, K, nprime, n, nl, l, Mp, T);
   if (sqr)
 {
   mp_size_t pla;
   pla = l * (K - 1) + nprime + 1; /* number of required limbs for p */
   B = TMP_BALLOC_LIMBS (pla);
-  Bp = TMP_BALLOC_MP_PTRS (K);
 }
   else
 {
 

Re: New mulmod_bknp1

2022-03-04 Thread Marco Bodrato

Ciao,

Il 2022-02-23 12:35 Marco Bodrato ha scritto:

... then maybe I was wrong when I wrote that we should not trade a
factor 3 with a factor 2...


Yes, I think I was wrong. I pushed to the repository the use of the 
_bknp1 (mod B^kn+1) code on the +1 side of the _bnm1 multiplication code 
(mod B^2n-1).


And here are the measures by tune/speed on our development machine 
"shell".
The _rounded measures use a larger size, rounded to some multiple of 
some power of two. Actually, they are not faster than the non-rounded 
size. Moreover, before the patch the _rounded cost was monotonous, now 
it's not.

The rounding strategy needs a revision.

[bodrato@shell ~/gmp-repo]$ /var/tmp/bodrato/gmp/hg/build/tune/speed 
-p1 -s 72-900 -rc -t36 mpn_mulmod_bnm1_rounded mpn_mulmod_bnm1
overhead 5.84 cycles, precision 1 units of 2.86e-10 secs, CPU 
freq 3500.09 MHz

mpn_mulmod_bnm1_rounded mpn_mulmod_bnm1
724939.99   #0.9984
108  10049.80   #0.9249
144  12718.57   #0.9997
180  22397.62   #0.8529
216  23185.73   #0.9993
252  36533.91   #0.8669
288  34125.65   #0.9981
324  56029.22   #0.8225
360  48846.70   #0.
396  63004.78   #0.9731
432  61179.69   #0.9996
468  95840.50   #0.8255
504 #80506.971.0001
540 113637.14   #0.8538
576 #92016.261.0002
612 127849.99   #0.9169
648#115918.531.
684 161355.14   #0.8673
720 130868.85   #0.9980
756 166164.06   #0.9613
792 197147.66   #0.7897
828 196738.21   #0.9304
864 212050.34   #0.9630
900 217971.55   #0.9548

Ĝis,
m

--
http://bodrato.it/papers/
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: binvert_limb speedup on 64 bit machines with UHWtype

2022-03-01 Thread Marco Bodrato

Ciao,

Il 2022-02-27 16:52 Marco Bodrato ha scritto:

Il 2022-02-25 17:06 John Gatrell ha scritto:
I tested using UHWtype in the macro for binvert_limb. On a 64 bit 
machine
one of my programs gained a 3% speedup. On a 32 bit machine, there was 
no



Should we use uint8_fast_t, uint16_fast_t, uint32_fast_t for the
different levels, and let the compiler choose? :-D


I tried code with uint_fast types, but it seems that the compiler is not 
choosing the faster type, the 64-bits type is always used :-(


You should try to store also the 32-bits result into the half-type.

I mean: try replacing the following two lines in your code
__inv = 2 * __hinv - __hinv * __hinv * __n;  /* 32 */   
\
__inv = 2 * __inv - __inv * __inv * __n; /* 64 */   
\

with
__hinv = 2 * __hinv - __hinv * __hinv * __n;  /* 32 */   
\
__inv = 2 * (mp_limb_t)__hinv - (mp_limb_t)__hinv * __hinv * __n; /* 
64 */ \


Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: binvert_limb speedup on 64 bit machines with UHWtype

2022-02-27 Thread Marco Bodrato

Ciao,

Il 2022-02-27 18:13 Marco Bodrato ha scritto:

I did never look into that file :-)


I inserted there a few more versions of binvert_limb.
The attached patch is only for testing, not to be pushed (I used 
uint_fast##_t types).


On shell I get the following:
@shell ~/gmp-repo]$ /var/tmp/bodrato/gmp/hg/build/tune/speed -p1 
-s 1 -rc binvert_limb binvert_limb_sec binvert_limb_pipe 
binvert_limb_uintfast
overhead 5.84 cycles, precision 1 units of 2.86e-10 secs, CPU 
freq 3500.09 MHz

   binvert_limb binvert_limb_sec binvert_limb_pipe binvert_limb_uintfast
1 28.39  1.0485  #0.8930   0.

The _sec version is 5% slower, and the _pipe one is 10% faster than the 
current.


Ĝis,
mdiff -r 1cafba189d5a tune/modlinv.c
--- a/tune/modlinv.c	Sun Feb 27 15:10:38 2022 +0100
+++ b/tune/modlinv.c	Sun Feb 27 18:35:48 2022 +0100
@@ -1,7 +1,7 @@
 /* Alternate implementations of binvert_limb to compare speeds. */
 
 /*
-Copyright 2000, 2002 Free Software Foundation, Inc.
+Copyright 2000, 2002, 2022 Free Software Foundation, Inc.
 
 This file is part of the GNU MP Library.
 
@@ -29,11 +29,133 @@
 GNU Lesser General Public License along with the GNU MP Library.  If not,
 see https://www.gnu.org/licenses/.  */
 
-#include 
 #include "gmp-impl.h"
 #include "longlong.h"
 #include "speed.h"
 
+#define binvert_limb_uintfast(inv,n)	\
+  do {	\
+mp_limb_t  __n = (n);		\
+mp_limb_t  __inv;			\
+uint_fast8_t  __inv8;		\
+ASSERT ((__n & 1) == 1);		\
+	\
+__inv8 = binvert_limb_table[(__n & 0xff)/2]; /*  8 */		\
+if (GMP_NUMB_BITS > 8) {		\
+  uint_fast16_t __inv16 = 2 * (uint_fast16_t)__inv8 - (uint_fast16_t)__inv8 * __inv8 * (uint_fast16_t)__n; \
+if (GMP_NUMB_BITS > 16) {		\
+  uint_fast32_t __inv32 = 2 * (uint_fast32_t)__inv16 - (uint_fast32_t)__inv16 * __inv16 * (uint_fast32_t)__n; \
+if (GMP_NUMB_BITS > 32) {		\
+__inv = 2 * (mp_limb_t)__inv32 - (mp_limb_t)__inv32 * __inv32 * __n; \
+} else { __inv = __inv32; };	\
+} else { __inv = __inv16; };	\
+} else { __inv = __inv8; };		\
+	\
+if (GMP_NUMB_BITS > 64)		\
+  {	\
+	int  __invbits = 64;		\
+	do {\
+	  __inv = 2 * __inv - __inv * __inv * __n;			\
+	  __invbits *= 2;		\
+	} while (__invbits < GMP_NUMB_BITS);\
+  }	\
+	\
+ASSERT ((__inv * __n & GMP_NUMB_MASK) == 1);			\
+(inv) = __inv & GMP_NUMB_MASK;	\
+  } while (0)
+
+/* An (imperfect) copy of the binvert function contained in
+   mpn/generic/sec_powm.c .
+ */
+
+#if GMP_LIMB_BITS <= 32
+#define binvert_limb_sec(inv,n)	\
+  do {\
+mp_limb_t  __n = (n);	\
+mp_limb_t  __inv, __t;	\
+ASSERT ((__n & 1) == 1);	\
+/* 3 + 2 -> 5 */		\
+__inv = __n + (((__n + 1) << 1) & 0x18);			\
+\
+__t = __n * __inv;		\
+/* 5 x 2 + 2 -> 12 */	\
+__inv = 2*__inv - __inv*__t + ((__inv<<10)&-(__t&(1<<5)));	\
+\
+if (GMP_NUMB_BITS > 12)	\
+  {\
+__t = __n * __inv - 1;	\
+	/* 12 x 3 -> 36 */	\
+	__inv += __inv * __t * (__t - 1);			\
+  }\
+\
+ASSERT ((__inv * __n & GMP_NUMB_MASK) == 1);		\
+(inv) = __inv & GMP_NUMB_MASK;\
+  } while (0)
+#else /* GMP_LIMB_BITS > 32, but actually > 48 is required */
+#define binvert_limb_sec(inv,n)	\
+  do {\
+mp_limb_t  __n = (n);	\
+mp_limb_t  __inv, __t;	\
+ASSERT ((__n & 1) == 1);	\
+/* 3 + 2 -> 5 */		\
+__inv = __n + (((__n + 1) << 1) & 0x18);			\
+\
+__t = __n * __inv;		\
+/* 5 x 2 + 2 -> 12 */	\
+__inv = 2*__inv - __inv*__t + ((__inv<<10)&-(__t&(1<<5)));	\
+\
+__t = __n * __inv - 1;	\
+mp_limb_t __t2 = __t * __t;	\
+/* 12 x 5 + 4 -> 64 */   	\
+__inv *= (__t2+1)*(__t2-__t)+1 -((__t<<48)&-(__t&(1<<12))); \
+\
+/* 64 -> 128 -> 256 -> ... */\
+for (int __b = (GMP_NUMB_BITS - 1) >> 6; __b != 0; __b >>= 1)	\
+  __inv = 2 * __inv - __inv * __inv * __n;			\
+\
+ASSERT ((__inv * __n & GMP_NUMB_MASK) == 1);		\
+(inv) = __inv & GMP_NUMB_MASK;\
+  } while (0)
+#endif
+
+/* Like the standard version in gmp-impl.h, but with a different path
+   for bit sizes larger than 32, with concurrent multiplications.  */
+
+#define binvert_limb_pipe(inv,n)	\
+  do {	\
+mp_limb_t  __n = (n);		\
+mp_limb_t  __inv;			\
+ASSERT ((__n & 1) == 1);		\
+	\
+__inv = binvert_limb_table[(__n & 0xFF) / 2]; /*  8 */		\
+if (GMP_NUMB_BITS <= 32)		\
+  {	\
+	if (GMP_NUMB_BITS > 8)		\
+	  __inv = 2 

Re: binvert_limb speedup on 64 bit machines with UHWtype

2022-02-27 Thread Marco Bodrato

Ciao Torbjörn,

Il 2022-02-27 10:53 Torbjörn Granlund ha scritto:

Perhaps one could write it (n & 0xff)/2 and get better code


Actually, this trick is already used! In tune/modlinv.c :-/
The following line dates back to 2002:
__inv = binvert_limb_table[(__n&0xFF)/2]; /*  8 */  \

I did never look into that file :-)

Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: binvert_limb speedup on 64 bit machines with UHWtype

2022-02-27 Thread Marco Bodrato

Ciao John Gartell,

Il 2022-02-25 17:06 John Gatrell ha scritto:
Hi everyone. I'm new here so I don't know how to submit a new 
gmp-impl.h


This list is the correct place for such a discussion.

I tested using UHWtype in the macro for binvert_limb. On a 64 bit 
machine
one of my programs gained a 3% speedup. On a 32 bit machine, there was 
no


Unfortunately, on different platforms, we have different speeds.
Should we use uint8_fast_t, uint16_fast_t, uint32_fast_t for the 
different levels, and let the compiler choose? :-D


Anyway, if you are curious, you can also try what happens with a 
table-lookup-free binvert_limb function. You can find it, named 
sec_binvert_limb, in the file mpn/generic/sec_powm.c .


There are also architectures where smaller operations are NOT faster, 
but, on the other side, more than one mul instruction can run 
concurrently. For those architectures I'd suggest the following, any 
couple of multiplications beyond the 32-bits limit can be computed in 
parallel.


#define binvert_limb(inv,n) 
\
  do {  
\
mp_limb_t  __n = (n);   
\
mp_limb_t  __inv;   
\
ASSERT ((__n & 1) == 1);
\

\
__inv = binvert_limb_table[(__n/2) & 0x7F]; /*  8 */
\
if (GMP_NUMB_BITS <= 32)
\
  { 
\
if (GMP_NUMB_BITS > 8)  
\
  __inv = 2 * __inv - __inv * __inv * __n;  
\
if (GMP_NUMB_BITS > 16) 
\
  __inv = 2 * __inv - __inv * __inv * __n;  
\
  } 
\
else
\
  { 
\
mp_limb_t  __t, __t2, __p;  
\
int  __invbits = 32;
\
__t = __n * __inv - 1;  /* x */ 
\
__t2= __t * __t;/* x^2 */   
\
__p = __inv * (__t - __t2); /* (1-x^2)x/(x+1) */
\

\
while (__invbits < GMP_NUMB_BITS - 8)   
\
  { 
\
__p  *= __t2 + 1;   /* (1-x^{2^n})x/(x+1) */
\
__t2 *= __t2;   /* x^{2^n} */   
\
__invbits <<= 1;
\
  } 
\
/* __inv * (x^{2^{n+2}+1}-1)/(x+1) */   
\
__inv -= __p * (__t2 + 1);  
\
  } 
\

\
ASSERT ((__inv * __n & GMP_NUMB_MASK) == 1);
\
(inv) = __inv & GMP_NUMB_MASK;  
\

  } while (0)

In any case, for 64-bits machines, all the described functions use 6 
multiplications :-) but with very different paths toward the desired 
result! :-D


Maybe you can try them on your platform... and tell us exactly which CPU 
are you using for your tests, and how do they behave.


Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: New mulmod_bknp1

2022-02-23 Thread Marco Bodrato

Ciao Niels and David,

Il 2022-02-23 09:14 ni...@lysator.liu.se ha scritto:

N^3-1 = (N^2+N+1)/3 * 3^{k+1} * (N-1)/3^k



Could be implemented as one multiply each mod 2^N+N+1 and (N-1),
followed by reduction mod (N^2+N+1)/3 and (N-1)/3^k at the end; these
reductions correspond to small quotients and should be cheap (I
suspect we have to require canonical reduction for CRT reconstruction 
to


Yes, there are many details that one needs to take into account to 
implement something like that...


Il 2022-02-22 22:55 David Harvey ha scritto:

What about

B^33 - 1 = (B^11 - 1)(B^22 + B^11 + 1)?

Also, I suspect that some of these tricks are worth doing even if the
factorisations cross the B boundaries. The extra shifting overhead is 
only

linear.


I do not, but if someone has the time to try to implement some 
extensions on the _bnm1 side (B^n-1), I'd suggest to try at first the 
"cross the B boundaries" strategy.


I mean, if we have H s.t. H^2 = B, I'd try to split
B^33 - 1 = (H^33 - 1)(H^33 + 1)


One more observation.

Currently, the library may require a product eg. mod B^72-1. The current 
_bnm1 code splits the computation into

B^72-1 = (B^36+1)(B^18+1)(B^9+1)(B^9-1)

I'd expect that more than half the time is spent computing mod B^36+1.
Less than 1/4 the time is spent mod B^18+1.
Less than 1/8 the time is spent for each of the two last factors B^9+1, 
and B^9-1.



... then maybe I was wrong when I wrote that we should not trade a 
factor 3 with a factor 2...


Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: New mulmod_bknp1

2022-02-22 Thread Marco Bodrato
Ciao David,

Il Mar, 22 Febbraio 2022 10:55 pm, David Harvey ha scritto:
> On Tue, 2022-02-22 at 22:39 +0100, Marco Bodrato wrote:
>> > E.g, in this case we could try a top-level B^66 - 1 product, split in
>> > B^33+1 and B^33-1; then the former suits your new algorithm well, but
>> > the former can't be recursively split (at least not on a B boundary).

>> I fully agree.

> What about
>
> B^33 - 1 = (B^11 - 1)(B^22 + B^11 + 1)?

Actually, B is a number.

And if B is a power of 4 (there is an even number of bits in a limb, as
usually happens, e.g. 32, or 64) then (B^n-1) and (B^2n+B^n+1) are not
coprime, because both are divisible by 3. And we can not use CRT.

> Also, I suspect that some of these tricks are worth doing even if the
> factorisations cross the B boundaries. The extra shifting overhead is only
> linear.

Maybe. One should write the code and try.

Ĝis,
m

___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: New mulmod_bknp1

2022-02-22 Thread Marco Bodrato
Ciao,

Il Mar, 22 Febbraio 2022 8:04 pm, Niels Möller ha scritto:
> Marco Bodrato  writes:
>> Simply, if a multiplication mod B^{3n}+1 is needed, the code computes
>>  - a product mod B^{n}+1
>>  - a product mod B^{2n}-B^{n}+1
>>  - with CRT, the desired result is obtained.

> Ok, and can the second product be computed more efficiently than full
> product + mod operation?

No. At least, I don't know; and it's currently implemented as a product+mod.

But, if M(n) (the cost of a nxn product) is more than linear in n, then
it's a good idea to split the computation into M(a)+M(b) where a+b=n.

>> This is generalised for multiplications mod B^{kn}+1 for k in

> Intuitively, I'd expect the gain is largest for k = 3 (since for larger
> k, we get even further away from splitting in half). Is that right?

Yes, indeed!

> It seems a bit tricky to take this into account in
> mpn_mulmod_bnm1_next_size, but maybe it's doable?
>
> E.g, in this case we could try a top-level B^66 - 1 product, split in
> B^33+1 and B^33-1; then the former suits your new algorithm well, but
> the former can't be recursively split (at least not on a B boundary). If

I fully agree.

MULMOD_BNM1_THRESHOLD is usually small, and the expected gain on the bnm1
side is larger than on the bnp1 side. We should never trade a 2 factor
with a 3 factor (or 8 with 9...).

But, assume MULMOD_BNM1_THRESHOLD > 15, should we prefer the size 14*32 or
the size 15*32? Maybe the latter!

> And we also have mpn_fft_next_size, but I'm not familiar with how that
> works.

I believe that code is quite simple, it chooses the first multiple of 2^k,
taking k from the "the best k for a given range" table.

But question "how does the new code interacts with the code that generate
the 'best k' table?" seems more complex.

Ĝis,
m

___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: New mulmod_bknp1

2022-02-22 Thread Marco Bodrato

Ciao,

Il 2022-02-21 01:37 Torbjörn Granlund ha scritto:

I am too busy to examine the code to see what you've done.  Perhaps you
could outline the algorithms here?


Nothing special...
Simply, if a multiplication mod B^{3n}+1 is needed, the code computes
 - a product mod B^{n}+1
 - a product mod B^{2n}-B^{n}+1
 - with CRT, the desired result is obtained.

This is generalised for multiplications mod B^{kn}+1 for k in 
{3,5,7,13,17}.



Is n = 3^t-k now slower than n' = 3^t for small k (with k mod 3 != 0)?


Yes. But that's true for the product mod B^n+1.


Then we could zero-pad such operands...


If a product mod B^64-1 is required to mulmod_bnm1, then the function 
computes two multiplications: mod B^32-1, and mod B^32+1.
Computing a product mod B^33+1 is now faster than computing mod 
B^32+1...
...but the product mod B^33+1 is simply not useful, the calling function 
needs the result mod B^32+1!


Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: New mulmod_bknp1

2022-02-18 Thread Marco Bodrato

Ciao,

Il 2022-02-15 11:48 Marco Bodrato ha scritto:

I pushed some new functions to compute the product (and square) modulo
B^nn+1, for small values of the exponent nn. For large values the
already available fft_mul should be used.


A possible use is replacing the last-level point-wise multiplication of 
mul_fft.

I tried to do that.


The new functions actually require nn=k*n, where k can be 3 or in {5,
7, 13, 17}.


This means that it can be used only when mul_fft falls into such a size.
The results is that for some sizes there is an effect,
for other sizes there is none.
Not optimal, but an improvement anyway...

I attach a graph (fft_with_bknp1.png) obtained plotting the output of
tune/speed -s 256-131072 -t128 mpn_mul_fft
before (_old) and after (_new) the patch.


Unfortunately the current tuning criteria does not interact well with 
the added code.
If the library is re-tuned, then there are sizes where a worst 
combination is chosen and the effect is a slowdown. I attach another 
graph (fft_retuned.png) with the timings obtained as above, the 
"_retuned" line was measured after re-tuning (make -C tune tune).


I don't know how the generation of FFT_TABLE works, should we change it 
somehow?


Ĝis,
m___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


New mulmod_bknp1

2022-02-15 Thread Marco Bodrato

Ciao,

I pushed some new functions to compute the product (and square) modulo 
B^nn+1, for small values of the exponent nn. For large values the 
already available fft_mul should be used.


The new functions actually require nn=k*n, where k can be 3 or in {5, 7, 
13, 17}.

Also k=11 can be supported, but it's currently disabled.

I added also tune/speed support, but the problem with the analysis is 
that the function is defined for some values only...


Here are some results:

@shell ~/gmp-repo]$ /var/tmp/bodrato/gmp/hg/build/tune/speed -Cr -s 
75-85\,240-250\,725-735 mpn_mul_n mpn_mulmod_bknp1
overhead 7.15 cycles, precision 1 units of 2.86e-10 secs, CPU freq 
3500.09 MHz

mpn_mul_n mpn_mulmod_bknp1
75   159.4800   #0.9715
76  #167.2895   n/a
77  #167.15581.0793
78   193.8462   #0.7311
79  #169.8734   n/a
80   182.1500   #0.8597
81   196.0617   #0.7721
82  #170.8780   n/a
83  #170.8916   n/a
84   169.2024   #0.9344
85   188.8235   #0.7898
240  258.7792   #0.7474
241 #254.4274   n/a
242  266.2355   #0.9051
243  260.2798   #0.7414
244 #256.3893   n/a
245  256.8449   #0.8595
246  255.7073   #0.7915
247  257.0324   #0.9904
248 #257.6532   n/a
249  257.2410   #0.7925
250  258.1480   #0.8535
725  378.2538   #0.8652
726  377.4793   #0.8362
727 #376.6836   n/a
728  376.2060   #0.9164
729  379.7462   #0.7682
730  379.1068   #0.8727
731  379.0109   #0.9805
732  377.9754   #0.8285
733 #379.6958   n/a
734 #377.6403   n/a
735  376.0204   #0.8094

As you can see, the new functions are particularly effective, when there 
are many factors "3" in the size...


Integrating this in the current mod_bnm1 is easy, but the effect would 
be to accelerate the operations only for some sizes. This may add 
"noise" to the thresholds system. Some more experiments are needed.


In the meanwhile, the base code will be tested by the great testing 
system of GMP :-)


Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: mpq_mul_ui

2022-02-08 Thread Marco Bodrato

Ciao,

Il 2022-01-23 01:34 Marc Glisse ha scritto:

Hello,

What would you think of adding mpq_mul_ui, mpq_div_ui, mpq_ui_div, and
also the _z versions?


I tried to think to the mini-mpq version of the _z flavor.

diff -r ed0406cf3c70 mini-gmp/mini-mpq.c
--- a/mini-gmp/mini-mpq.c   Wed Feb 02 19:16:36 2022 +0100
+++ b/mini-gmp/mini-mpq.c   Tue Feb 08 10:04:55 2022 +0100
@@ -395,6 +395,30 @@
 }

 void
+mpq_mul_z (mpq_t r, const mpq_t a, const mpz_t b)
+{
+  mpq_t t;
+  static const mp_limb_t u = 1;
+
+  mpq_mul (r, a, mpq_roinit_normal_nn (t, b->_mp_d, b->_mp_size, , 
1));

+}
+
+void
+mpq_z_div (mpq_t r, const mpz_t b, const mpq_t a)
+{
+  mpq_t t;
+  mpq_mul_z (r, mpq_roinit_zz (t, mpq_denref (a), mpq_numref (a)), b);
+}
+
+void
+mpq_div_z (mpq_t r, const mpq_t a, const mpz_t b)
+{
+  mpq_t t;
+  mpq_z_div (r, b, a);
+  mpq_inv (r, r);
+}
+
+void
 mpq_div_2exp (mpq_t r, const mpq_t q, mp_bitcnt_t e)
 {
   mp_bitcnt_t z = mpz_scan1 (mpq_numref (q), 0);



It seems simple enough. It should support (but I'm not sure!) the reuse 
cases

mpq_mul_z (B, A, mpq_denref (B))
and
mpq_mul_z (A, A, Z)

but I fear it would not support

mpq_mul_z (A, A, mpq_denref (A))
or
mpq_mul_z (A, A, mpq_numref (A))

Those are actually equivalent respectively to

mpz_set_ui (mpq_denref (A), 1);
and
mpz_mul (mpq_numref (A), mpq_numref (A), mpq_numref (A))

But I think we should support all reuse cases, shouldn't we?

Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Error handler for GMP?

2022-02-01 Thread Marco Bodrato

Ciao,

Il 2021-06-12 16:49 Marco Bodrato ha scritto:

Anyway, for internal coherence, I think that we should at least check
that all the functions in the library that might "abort" do this using
an "exception". For this, I propose the attached patch.


That patch ("Handle overflow in mpz_type through errno") was pushed.

We still have at least two functions directly using an abort():
__gmp_exception, and __gmp_invalid_operation.

They live in two different files (errno.c and invalid.c respectively), 
that handle somehow differently how to rise() a signal. The first checks 
#ifdef SIGFPE, the second depends on #if HAVE_UNISTD_H, and #if ! 
HAVE_RAISE.


I'm tempted to remove the file invalid.c, and to keep errno.c only, with 
the function __gmp_invalid_operation() calling __gmp_exception as all 
the other exceptions do. Of course with a new 
GMP_ERROR_INVALID_FLOAT_OPERATION = 32 constant.


Any comment?


En passant: the previous patch removed the message printed to stderr. 
Should we recover it using the following?


void
__gmp_overflow_in_mpz (void)
{
  fprintf (stderr, "gmp: overflow in mpz type\n"); /*This line added.*/
  __gmp_exception (GMP_ERROR_MPZ_OVERFLOW);
}

Should we print a message also for the other exceptions in errno.c?

Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: mpq_mul_ui

2022-01-31 Thread Marco Bodrato
Ciao,

Il Mer, 26 Gennaio 2022 7:53 am, Marco Bodrato ha scritto:
> Il 2022-01-24 10:04 Torbjörn Granlund ha scritto:
>> Marc Glisse  writes:
>>
>>   What would you think of adding mpq_mul_ui, mpq_div_ui, mpq_ui_div,
>> and
>>   also the _z versions?
>>
>> That would make sense to me.
>
> I agree too, I just remember an observation about the naming convention:
> https://gmplib.org/list-archives/gmp-discuss/2018-August/006274.html

...but actually only the _singleui versions of _mul_ and _div_ are really
useful. Moreover, to handle the _twouis version, the need to check if the
given ui/ui fraction is normalised arises.

Shall we start with the _z flavours?

On the other side, it might be useful to have functions to _add_, or _sub_
small fractions: eg. +1/2, -4/3.

Ĝis,
m

___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: mpq_mul_ui

2022-01-25 Thread Marco Bodrato

Ciao,

Il 2022-01-24 10:04 Torbjörn Granlund ha scritto:

Marc Glisse  writes:

  What would you think of adding mpq_mul_ui, mpq_div_ui, mpq_ui_div, 
and

  also the _z versions?

That would make sense to me.


I agree too, I just remember an observation about the naming convention:
https://gmplib.org/list-archives/gmp-discuss/2018-August/006274.html

Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: proposal to change the return type/value of mpz_primorial_ui

2021-11-14 Thread Marco Bodrato

Ciao Shane,

Il 2021-11-11 23:58 Shane Neph ha scritto:

First, thank you for your work.  I will be using it.  I'd like to see
it or similar become part of the GMP library.


I'm not sure you can use it! :-D
I forgot a licence, so the default is "all rights reserved".
I send the code again, with a GPLv3+ licence, and a correction such that 
the function works also for the value 2^32-1 in a 32-bits environment.


Moreover, remember that this implementation uses functions that are 
internal, undocumented, and because of this subject to unexpected 
changes in the future.


Before seeing something like this in GMP, we have to at least decide the 
interface :-)
To write the function, I got inspiration from a code by Seth Troisi, who 
used more than a call to the prime counting function to get the n-th 
prime number. Should we allow recycling parts of the computations 
between calls? How?



I also would like to see the primordial function return that same
value because it knows the value.


True, but this is not the only information that the function "knows" and 
might be useful. Why not returning, e.g., the larger prime?



In my work, I use primordial in ways similar to factorials, and I use
it over a factorial because it simply grows faster.


Not the implementation in GMP :-)
mpz_primorial_ui is the "square free" version of mpz_fac_ui :-D
Strictly smaller, when the argument is larger than 3.

Ĝis,
m

--
http://bodrato.it/papers//* pi (N) -- Returns the number of primes in the range [1..N].

Contributed to the GNU project by Marco Bodrato.

THE FUNCTION IN THIS FILE IS INTERNAL WITH A MUTABLE INTERFACE.
IT IS ONLY SAFE TO REACH IT THROUGH DOCUMENTED INTERFACES.
IN FACT, IT IS ALMOST GUARANTEED THAT IT WILL CHANGE OR
DISAPPEAR IN A FUTURE GNU MP RELEASE.

Copyright 2019-2021 Free Software Foundation, Inc.

This file was written for the GNU MP Library, and it is free software;
you can redistribute it and/or modify it under the terms of the GNU
General Public License as published by the Free Software Foundation;
either version 3 of the License, or (at your option) any later
version.

The GNU MP Library is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
for more details.

You should have received copies of the GNU General Public License and the
GNU Lesser General Public License along with the GNU MP Library.  If not,
see https://www.gnu.org/licenses/.  */

#include "gmp-impl.h"
#include "longlong.h"

#ifndef PI_LEGENDRE_THRESHOLD
#define PI_LEGENDRE_THRESHOLD 8
#endif

/* n_fto_bit (n) = ((n-1)&(-CNST_LIMB(2)))/3U-1 */
static mp_limb_t
n_fto_bit (mp_limb_t n) { return ((n-5)|1)/3U; }
/* n_fto_bit(x) = gmp_legendre_pi_phi_2 (x) - 2 */

static mp_size_t
primesieve_size (mp_limb_t n) { return n_fto_bit(n) / GMP_LIMB_BITS + 1; }

static mp_limb_t
limb_apprsqrt (mp_limb_t x)
{
  int s;

  ASSERT (x > 2);
  count_leading_zeros (s, x);
  s = (GMP_LIMB_BITS - s) >> 1;
  return ((CNST_LIMB(1) << (s - 1)) + (x >> 1 >> s));
}

/* phi(a, 0) = a
 * phi(a, b) = phi(a, b-1) - phi(a/p_b, b-1)
 * phi(a, 1) = phi(a, 0) - phi(a/2, 0) = a - a/2
 * phi(a, 2) = phi(a, 1) - phi(a/3, 1) = a - a/2 - (a/3 - a/6) ~= a/3
 * phi(a, b) = phi(a, 2) - sum_{1 0);
  /* if (primes == 0) */
  /*   return res; */

  /* if (res * 3 == n) n -= 2; */
  unsigned count = 0;
  unsigned long i = 4, *sp = sieve;
  do { /* Loop on primes */
for (unsigned long b = i, x = ~ *(sp++); x != 0; b += 3, x >>= 1)
  if (x & 1)
	{
	  /* b = 4 + 3*k :
	   * if k = 2n, b = 4 (mod 6) is even, the prime is b+1,
	   *and b+2 = 0 (mod 6) is composite;
	   * if k = 2n+1, b = 1 (mod 6) is odd, b is the prime,
	   *and b+1 (even), b+2 = 3 (mod 6), b+2+(b%2) (even) are composite.
	   */
	  unsigned long frac = n / (b | 1);

	  if (frac <= (b + 2 + (b & 1)))
	return res - primes + count - (frac >= (b | 1));

	  if (count < 2) /* Directly compute, avoid recursion. */
	res -= gmp_legendre_pi_phi_2 (frac) /* count == 0 */
	  - (count ? gmp_legendre_pi_phi_2 (frac / 5) : 0); /* count == 1 */
	  else
	res -= gmp_legendre_pi_phi(frac, count, sieve);

	  if (primes == ++count)
	return res;
	}
i += GMP_LIMB_BITS * 3;
  } while (1);
}

/* Takes a pre-sieved range of primes [5..maxprime].
 * The range contains primes_in_sieve primes.
 *
 * Legendre showed that, if maxprime^2 >= n, then
 * primepi(n) = primepi(maxprime) + phi(n, primepi(maxprime)) - 1
 * where phi (a, b) = numbers x <= a, not divisible by the first b primes.
 *
 * The sieve does not count 2 and 3, so that
 * primepi(maxprime) = primes_in_sieve + 2
 *
 * Here we pass to pi_phi the prime index of b: primes_in_sieve.
 */
static long
gmp_legendre_pi_ps (unsigned long n, unsigned primes_in_sieve, mp_ptr sieve)
{
#if (ULO

Re: proposal to change the return type/value of mpz_primorial_ui

2021-11-11 Thread Marco Bodrato

Ciao,

Il 2021-11-08 18:57 Marco Bodrato ha scritto:

I got inspired by a piece of code that Seth Troisi sent a couple of
years ago and I wrote a gmp_pi_ui(n) function with two variants, a
trivial one (it calls gmp_primesieve) and an implementation of the
Legendre strategy.

I attach it for comments. Compiled with -DMAIN it prints the number of
primes up to increasing powers of two. On my computer, with small
ranges it is slower than GP/Pari, for large ranges, it is faster...


I checked the results against https://oeis.org/A007053 and 
https://oeis.org/A006880 up to 2^47. Everything seems correct... Of 
course, for "large" (>2^32) values it is a slow function.


It currently takes an ui for the argument. But it should not be 
impossible to implement it also for values larger than 32 bits on 
32-bits CPUs... Is it worth trying?


Ĝis,
m

--
http://bodrato.it/papers/
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Doc tweak

2021-11-09 Thread Marco Bodrato

Ciao Marc,

Il 2021-11-06 13:35 Marc Glisse ha scritto:

someone got confused by the text saying that all the GMP declarations
are in gmp.h and thought the C++ class interface would be there as
well, so I'll probably add this patch unless someone has a better
proposition.


I agree.

Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: proposal to change the return type/value of mpz_primorial_ui

2021-11-08 Thread Marco Bodrato

Ciao, Shane,

On Sat, Nov 6, 2021 at 1:31 PM Marco Bodrato  
wrote:

Il 2021-11-03 19:18 Shane Neph ha scritto:
> mpz_primorial_ui( ) is a great function.  Why not return the actual
> number of primes that go into that calculation?

It may make sense, but only if we add also another function: a way to
compute the number of primes in a range; or at least in the
range [1..n], i.e. pi(n) for a given unsigned long n.


Il 2021-11-06 22:29 Shane Neph ha scritto:

That would be a nice solution.


If you really need both the primorial and the prime count, my solution 
is somehow a waste, because no memory of the sieved numbers is kept 
between the library calls. The numbers are sieved twice...


I got inspired by a piece of code that Seth Troisi sent a couple of 
years ago and I wrote a gmp_pi_ui(n) function with two variants, a 
trivial one (it calls gmp_primesieve) and an implementation of the 
Legendre strategy.


I attach it for comments. Compiled with -DMAIN it prints the number of 
primes up to increasing powers of two. On my computer, with small ranges 
it is slower than GP/Pari, for large ranges, it is faster...


Ĝis,
m/* pi (N) -- Returns the number of primes in the range [1..N].

Copyright 2019-2021 Marco Bodrato.

 */

#include "gmp-impl.h"
#include "longlong.h"

#ifndef PI_LEGENDRE_THRESHOLD
#define PI_LEGENDRE_THRESHOLD 8
#endif

/* n_fto_bit (n) = ((n-1)&(-CNST_LIMB(2)))/3U-1 */
static mp_limb_t
n_fto_bit (mp_limb_t n) { return ((n-5)|1)/3U; }
/* n_fto_bit(x) + 2 = gmp_legendre_pi_phi_2 (x) */

static mp_size_t
primesieve_size (mp_limb_t n) { return n_fto_bit(n) / GMP_LIMB_BITS + 1; }

static mp_limb_t
limb_apprsqrt (mp_limb_t x)
{
  int s;

  ASSERT (x > 2);
  count_leading_zeros (s, x);
  s = (GMP_LIMB_BITS - s) >> 1;
  return ((CNST_LIMB(1) << (s - 1)) + (x >> 1 >> s));
}

/* phi(a, 0) = a
 * phi(a, b) = phi(a, b-1) - phi(a/p_b, b-1)
 * phi(a, 1) = phi(a, 0) - phi(a/2, 0) = a - a/2
 * phi(a, 2) = phi(a, 1) - phi(a/3, 1) = a - a/2 - (a/3 - a/6) ~= a/3
 * phi(a, b) = phi(a, 2) - sum_{1 0);
  /* if (primes == 0) */
  /*   return res; */

  /* if (res * 3 == n) n -= 2; */
  unsigned count = 0;
  unsigned long i = 4, *sp = sieve;
  do { /* Loop on primes */
for (unsigned long b = i, x = ~ *(sp++); x != 0; b += 3, x >>= 1)
  if (x & 1)
	{
	  /* b = 4 + 3*k :
	   * if k = 2n, b = 4 (mod 6) is even, the prime is b+1,
	   *and b+2 = 0 (mod 6) is composite;
	   * if k = 2n+1, b = 1 (mod 6) is odd, b is the prime,
	   *and b+1 (even), b+2 = 3 (mod 6), b+2+(b%2) (even) are composite.
	   */
	  unsigned long frac = n / (b | 1);

	  if (frac <= (b + 2 + (b & 1)))
	return res - primes + count - (frac >= (b | 1));

	  if (count < 2) /* Directly compute, avoid recursion. */
	res -= gmp_legendre_pi_phi_2 (frac) /* count == 0 */
	  - (count ? gmp_legendre_pi_phi_2 (frac / 5) : 0); /* count == 1 */
	  else
	res -= gmp_legendre_pi_phi(frac, count, sieve);

	  if (primes == ++count)
	return res;
	}
i += GMP_LIMB_BITS * 3;
  } while (1);
}

/* Takes a pre-sieved range of primes [5..maxprime].
 * The range contains primes_in_sieve primes.
 *
 * Legendre showed that, if maxprime^2 >= n, then
 * primepi(n) = primepi(maxprime) + phi(n, primepi(maxprime)) - 1
 * where phi (a, b) = numbers x <= a, not divisible by the first b primes.
 *
 * The sieve does not count 2 and 3, so that
 * primepi(maxprime) = primes_in_sieve + 2
 *
 * Here we pass to pi_phi the prime index of b: primes_in_sieve.
 */
static long
gmp_legendre_pi_ps (unsigned long n, unsigned primes_in_sieve, mp_ptr sieve)
{
  return primes_in_sieve + 1 + gmp_legendre_pi_phi(n, primes_in_sieve, sieve);
}

static long
gmp_legendre_pi_ui(unsigned long n)
{
  unsigned long sqrt_n = limb_apprsqrt (n);
  unsigned sieve_size = primesieve_size (sqrt_n);
  TMP_SDECL;

  TMP_SMARK;
  mp_ptr sieve = TMP_SALLOC_LIMBS (sieve_size);
  unsigned count = gmp_primesieve (sieve, sqrt_n);
  long res = gmp_legendre_pi_ps (n, count, sieve);
  TMP_SFREE;

  return res;
}

/* TODO: the _primesieve interface should be extended, to allow
   piece-wise sieving. Storing the full sieve is wasteful here.
*/
static long
gmp_persieve_pi_ui (unsigned long n)
{
  mp_size_t size = primesieve_size (n);
  TMP_DECL;

  TMP_MARK;
  long res = gmp_primesieve (TMP_ALLOC_LIMBS (size), n);
  TMP_FREE;

  return res + 2;
}

long
gmp_pi_ui (unsigned long n)
{
  if (n < 7)
return (n>1) + (n>2) + (n>4);
  if (BELOW_THRESHOLD (n, PI_LEGENDRE_THRESHOLD))
return gmp_persieve_pi_ui (n);
  else
return gmp_legendre_pi_ui (n);
}

#ifdef MAIN


int main() {
  unsigned step = 2;
  for (unsigned long n=1; n< ~0UL / step ; n *= step)
  gmp_printf ("%lu -> %ld\n", n, gmp_pi_ui (n));
}
#endif
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: proposal to change the return type/value of mpz_primorial_ui

2021-11-06 Thread Marco Bodrato

Ciao,

Il 2021-11-03 19:18 Shane Neph ha scritto:

mpz_primorial_ui( ) is a great function.  Why not return the actual
number of primes that go into that calculation?


It may make sense, but only if we add also another function: a way to
compute the number of primes in a range; or at least in the
range [1..n], i.e. pi(n) for a given unsigned long n.

Without that function, the only way to compute pi(n) in GMP would be:
compute primorial(n), discard the result, and take the return value 
only.


Such a function does not belong to any of the mp[nzfq] layers...
Can the prototype be something like the following?

long gmp_pi_ui(unsigned long n);

Of course, in this area GMP shall not claim to be the
faster available library ;-D

Ĝis,
m

--
http://bodrato.it/papers/
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: [Request for comments] Potential room for speedup when calculating divmod() of bases with many trailing 0 bits (in binary)

2021-11-01 Thread Marco Bodrato

Ciao,

Il 2020-09-21 17:30 Marco Bodrato ha scritto:

Il 2020-09-14 18:50 Shlomi Fish ha scritto:

I was able to improve upon mpz_mod here:


This is in case the number that one divides by is itself divisible by 
a

large power of 2.


There are many special forms for the divisors that can stimulate
writing special code to speed up the modulus computation. The form
k*2^n is one of them.


Is there interest in incorporating such a change to the core GMP 
library?



By the way, if you want to play with this, try the below patch for
GMP, recompile it, and test your "benchmark" again.


I slightly reworked the patch. Should we apply it?

It accelerates the functions mpz_?div_{qr,r} when the denominator has 
low zero limbs.


In the general case, checking should cost a single (does UNLIKELY make 
it well predicted?) branch, and just a couple of additional operations 
on pointers are needed: rp + n0, and dl - n0.


We never specially handle such a simple special case with the mpn layer, 
but it may make sense for the mpz layer. We already handle low zero 
limbs in mpz_powm...


Comments?

diff -r a9440b272ec5 mpz/tdiv_qr.c
--- a/mpz/tdiv_qr.c Sun Oct 31 14:59:02 2021 +0100
+++ b/mpz/tdiv_qr.c Mon Nov 01 11:15:08 2021 +0100
@@ -36,7 +36,7 @@
 void
 mpz_tdiv_qr (mpz_ptr quot, mpz_ptr rem, mpz_srcptr num, mpz_srcptr den)
 {
-  mp_size_t ql;
+  mp_size_t ql, n0;
   mp_size_t ns, ds, nl, dl;
   mp_ptr np, dp, qp, rp;
   TMP_DECL;
@@ -95,7 +95,12 @@
   np = tp;
 }

-  mpn_tdiv_qr (qp, rp, 0L, np, nl, dp, dl);
+  for (n0 = 0; UNLIKELY (*dp == 0); ++dp)
+{
+  rp [n0++] = *np++;
+  --nl;
+}
+  mpn_tdiv_qr (qp, rp + n0, 0L, np, nl, dp, dl - n0);

   ql -=  qp[ql - 1] == 0;
   MPN_NORMALIZE (rp, dl);
diff -r a9440b272ec5 mpz/tdiv_r.c
--- a/mpz/tdiv_r.c  Sun Oct 31 14:59:02 2021 +0100
+++ b/mpz/tdiv_r.c  Mon Nov 01 11:15:08 2021 +0100
@@ -35,7 +35,7 @@
 void
 mpz_tdiv_r (mpz_ptr rem, mpz_srcptr num, mpz_srcptr den)
 {
-  mp_size_t ql;
+  mp_size_t ql, n0;
   mp_size_t ns, nl, dl;
   mp_ptr np, dp, qp, rp;
   TMP_DECL;
@@ -88,7 +88,12 @@
   np = tp;
 }

-  mpn_tdiv_qr (qp, rp, 0L, np, nl, dp, dl);
+  for (n0 = 0; UNLIKELY (*dp == 0); ++dp)
+{
+  rp [n0++] = *np++;
+  --nl;
+}
+  mpn_tdiv_qr (qp, rp + n0, 0L, np, nl, dp, dl - n0);

   MPN_NORMALIZE (rp, dl);

Ĝis,
m

--
http://bodrato.it/papers/
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Suggested tune/tuneup.c patch

2021-11-01 Thread Marco Bodrato

Ciao,

Il 2021-11-01 01:04 Torbjörn Granlund ha scritto:

Marco Bodrato  writes:



  fac_ui... and I find an empty page if I look at
  https://gmplib.org/devel/thres/2021-10-30/FAC_ODD_THRESHOLD

Still there, but for some reason nginx had stopped serving gz pages, 
and

there were no non-gzip pages on the server.  I made a quick workaround.


And the workaround works. Thanks!

Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Suggested tune/tuneup.c patch

2021-10-31 Thread Marco Bodrato

Ciao,

Il 2019-09-11 19:37 t...@gmplib.org ha scritto:
I added a history preservation feature to the .../devel/thres/ pages.  
At

23:59 each night, all pages are copied to .../devel/thres/-MM-DD.

(There is no index, one needs to type in the wanted date manually.)


Is this still active?
I can access https://gmplib.org/devel/thres/2021-10-30/ , but I'd like 
to check how FAC_ODD_THRESHOLD evolves after my last commit to fac_ui... 
and I find an empty page if I look at

https://gmplib.org/devel/thres/2021-10-30/FAC_ODD_THRESHOLD

Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Please update addaddmul_1msb0.asm to support ABI in mingw64

2021-10-06 Thread Marco Bodrato

Ciao Niels,

Il 2021-10-06 21:54 ni...@lysator.liu.se ha scritto:

Now, question is if it can beat mul_1 + addmul_1. I don't know.


Well, the question is also if the current code beats mul_1 + addmul_1 
:-)


Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Please update addaddmul_1msb0.asm to support ABI in mingw64

2021-10-06 Thread Marco Bodrato

Ciao,

Il 2021-10-06 17:21 ni...@lysator.liu.se ha scritto:

Marco Bodrato  writes:

I agree. Let's choose between the two last versions, and I'll push it.


Nice with a few instructions less, feel free to push the version you
like best.


I pushed the last one:
https://gmplib.org/repo/gmp/rev/8e8ae372361b

Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Please update addaddmul_1msb0.asm to support ABI in mingw64

2021-10-06 Thread Marco Bodrato

Ciao Niels,

Il 2021-10-05 20:56 ni...@lysator.liu.se ha scritto:

Marco Bodrato  writes:

mov %r11, -16(rp)
mov %r10, %r11
jmp L(one)


I had hoped this jump and preceding instructions could be eliminated, 
to

get a structure like

ja  L(two)
jz  L(one)

L(nul): (no jumps to this label left)
...



But might need other move instructions, to get the right data into the
right registers?


Maybe with a couple of move with computed address, and with the same 
value in r10 and r11 we can get rid of the jump.

The difference, wrt the last code I sent, follows.

***
*** 122,135 
cmp $1, R32(n)
ja  L(two)
!   jnz L(nul)
!
!   mov -8(ap), %rax
!   mov %r11, -16(rp)
mov %r10, %r11
!   jmp L(one)

! L(nul):   mov -16(ap), %rax
!   mov %r11, -24(rp)
!   mul %r8
add %rax, %r10
mov -16(bp), %rax
--- 122,131 
cmp $1, R32(n)
ja  L(two)
!   mov -16(ap,n,8), %rax
!   mov %r11, -24(rp,n,8)
mov %r10, %r11
!   jz  L(one)

! L(nul):   mul %r8
add %rax, %r10
mov -16(bp), %rax


So I think your version is an improvement as is, and perhaps not worth
the effort to try to eliminate a few more instructions if this rather
obscure function.


I agree. Let's choose between the two last versions, and I'll push it.

Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Please update addaddmul_1msb0.asm to support ABI in mingw64

2021-10-05 Thread Marco Bodrato

Il 2021-10-05 07:38 ni...@lysator.liu.se ha scritto:
Some potential further improvements: It would likely be possible to 
order the

cases at the end as L(nul), L(one), L(two), and let the nul case fall
through into the one case, reducing the size a bit. And a mulx version
could likely eliminate a lot of the move instructions.


Well, I added one more move to order the cases as you suggest. The code 
gets a little bit shorter.



I think its performance matters mainly for
gcdext in the Lehmer size range. (It's also used by hgcd base case, but
for sizes where that is used, I'd guess that it's a pretty small part 
of

the complete gcd computation).


I also renamed registers, so that a push/pop couple is needed only if 
the loop is used; this may save a couple of cycles when the size is 
small. Does this make sense?


I attach the possible new version.

Ĝis,
mdnl  AMD64 mpn_addaddmul_1msb0, R = Au + Bv, u,v < 2^63.

dnl  Copyright 2008, 2021 Free Software Foundation, Inc.

dnl  This file is part of the GNU MP Library.
dnl
dnl  The GNU MP Library is free software; you can redistribute it and/or modify
dnl  it under the terms of either:
dnl
dnl* the GNU Lesser General Public License as published by the Free
dnl  Software Foundation; either version 3 of the License, or (at your
dnl  option) any later version.
dnl
dnl  or
dnl
dnl* the GNU General Public License as published by the Free Software
dnl  Foundation; either version 2 of the License, or (at your option) any
dnl  later version.
dnl
dnl  or both in parallel, as here.
dnl
dnl  The GNU MP Library is distributed in the hope that it will be useful, but
dnl  WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
dnl  or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
dnl  for more details.
dnl
dnl  You should have received copies of the GNU General Public License and the
dnl  GNU Lesser General Public License along with the GNU MP Library.  If not,
dnl  see https://www.gnu.org/licenses/.

include(`../config.m4')

C	 cycles/limb
C AMD K8,K9	 2.167
C AMD K10	 2.167
C Intel P4	12.0
C Intel core2	 4.0
C Intel corei	 ?
C Intel atom	 ?
C VIA nano	 ?

C TODO
C  * Perhaps handle various n mod 3 sizes better.  The code now is too large.

C INPUT PARAMETERS
define(`rp',	`%rdi')
define(`ap',	`%rsi')
define(`bp_param', `%rdx')
define(`n',	`%rcx')
define(`u0',	`%r8')
define(`v0',	`%r9')


define(`bp', `%rbp')

ABI_SUPPORT(DOS64)
ABI_SUPPORT(STD64)

ASM_START()
	TEXT
	ALIGN(16)
PROLOGUE(mpn_addaddmul_1msb0)
FUNC_ENTRY(4)
IFDOS(`	mov	56(%rsp), %r8	')
IFDOS(`	mov	64(%rsp), %r9	')
	push	%rbp

	lea	(ap,n,8), ap
	lea	(bp_param,n,8), bp
	lea	(rp,n,8), rp
	neg	n

	mov	(ap,n,8), %rax
	mul	%r8
	mov	%rax, %r11
	mov	(bp,n,8), %rax
	mov	%rdx, %r10
	add	$3, n
	jns	L(end)

	push	%r13

	ALIGN(16)
L(top):	mul	%r9
	add	%rax, %r11
	mov	-16(ap,n,8), %rax
	adc	%rdx, %r10
	mov	%r11, -24(rp,n,8)
	mul	%r8
	add	%rax, %r10
	mov	-16(bp,n,8), %rax
	mov	$0, R32(%r13)
	adc	%rdx, %r13
	mul	%r9
	add	%rax, %r10
	mov	-8(ap,n,8), %rax
	adc	%rdx, %r13
	mov	%r10, -16(rp,n,8)
	mul	%r8
	add	%rax, %r13
	mov	-8(bp,n,8), %rax
	mov	$0, R32(%r11)
	adc	%rdx, %r11
	mul	%r9
	add	%rax, %r13
	adc	%rdx, %r11
	mov	(ap,n,8), %rax
	mul	%r8
	add	%rax, %r11
	mov	%r13, -8(rp,n,8)
	mov	(bp,n,8), %rax
	mov	$0, R32(%r10)
	adc	%rdx, %r10
	add	$3, n
	js	L(top)

	pop	%r13

L(end):	mul	%r9
	add	%rax, %r11
	adc	%rdx, %r10
	cmp	$1, R32(n)
	ja	L(two)
	jnz	L(nul)

	mov	-8(ap), %rax
	mov	%r11, -16(rp)
	mov	%r10, %r11
	jmp	L(one)

L(nul):	mov	-16(ap), %rax
	mov	%r11, -24(rp)
	mul	%r8
	add	%rax, %r10
	mov	-16(bp), %rax
	mov	$0, R32(%r11)
	adc	%rdx, %r11
	mul	%r9
	add	%rax, %r10
	mov	-8(ap), %rax
	adc	%rdx, %r11
	mov	%r10, -16(rp)
L(one):	mul	%r8
	add	%rax, %r11
	mov	-8(bp), %rax
	mov	$0, R32(%r10)
	adc	%rdx, %r10
	mul	%r9
	add	%rax, %r11
	adc	%rdx, %r10

L(two):	mov	%r11, -8(rp)
	mov	%r10, %rax
L(ret):	pop	%rbp
	FUNC_EXIT()
	ret
EPILOGUE()
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Please update addaddmul_1msb0.asm to support ABI in mingw64

2021-10-04 Thread Marco Bodrato

Ciao Niels,

you are the author of this code.

Il 2020-06-07 21:48 Idigger Lee ha scritto:

Please update addaddmul_1msb0.asm to support ABI in mingw64.


While looking at this e-mail on gmp-bugs, I added DOS support and also 
reordered the branches around the exit code.

Do you agree with those changes?

8<-
*** 
/tmp/extdiff.X4CyBq/gmp-err.b03b3cb1ce98/mpn/x86_64/addaddmul_1msb0.asm	2020-11-16 
21:41:15.0 +0100
--- /home/bodrato/src/gmp-err/mpn/x86_64/addaddmul_1msb0.asm	2020-10-29 
04:37:12.369375684 +0100

*** define(`u0',`%r8')
*** 51,64 
--- 51,70 
  define(`v0',  `%r9')


  define(`bp', `%rbp')

+ ABI_SUPPORT(DOS64)
+ ABI_SUPPORT(STD64)
+
  ASM_START()
TEXT
ALIGN(16)
  PROLOGUE(mpn_addaddmul_1msb0)
+ FUNC_ENTRY(4)
+ IFDOS(`   mov 56(%rsp), %r8   ')
+ IFDOS(`   mov 64(%rsp), %r9   ')
push%r12
push%rbp

lea (ap,n,8), ap
lea (bp_param,n,8), bp
*** L(top): mul %r9
*** 105,170 
mov $0, R32(%r10)
adc %rdx, %r10
add $3, n
js  L(top)

! L(end):   cmp $1, R32(n)
!   ja  2f
!   jz  1f
!
!   mul %r9
add %rax, %r12
-   mov -16(ap), %rax
adc %rdx, %r10
!   mov %r12, -24(rp)
mul %r8
add %rax, %r10
!   mov -16(bp), %rax
mov $0, R32(%r11)
adc %rdx, %r11
mul %r9
add %rax, %r10
-   mov -8(ap), %rax
adc %rdx, %r11
!   mov %r10, -16(rp)
mul %r8
!   add %rax, %r11
!   mov -8(bp), %rax
mov $0, R32(%r12)
adc %rdx, %r12
mul %r9
!   add %rax, %r11
!   adc %rdx, %r12
!   mov %r11, -8(rp)
!   mov %r12, %rax
!   pop %rbp
!   pop %r12
!   ret
!
! 1:mul %r9
!   add %rax, %r12
mov -8(ap), %rax
!   adc %rdx, %r10
!   mov %r12, -16(rp)
mul %r8
!   add %rax, %r10
mov -8(bp), %rax
!   mov $0, R32(%r11)
!   adc %rdx, %r11
mul %r9
-   add %rax, %r10
-   adc %rdx, %r11
-   mov %r10, -8(rp)
-   mov %r11, %rax
-   pop %rbp
-   pop %r12
-   ret
-
- 2:mul %r9
add %rax, %r12
-   mov %r12, -8(rp)
adc %rdx, %r10
mov %r10, %rax
!   pop %rbp
pop %r12
ret
  EPILOGUE()
--- 111,164 
mov $0, R32(%r10)
adc %rdx, %r10
add $3, n
js  L(top)

! L(end):   mul %r9
add %rax, %r12
adc %rdx, %r10
!   cmp $1, R32(n)
!   ja  L(two)
!   jnz L(nul)
!
! L(one):   mov -8(ap), %rax
!   mov %r12, -16(rp)
mul %r8
add %rax, %r10
!   mov -8(bp), %rax
mov $0, R32(%r11)
adc %rdx, %r11
mul %r9
add %rax, %r10
adc %rdx, %r11
!   mov %r10, -8(rp)
!   mov %r11, %rax
!   jmp L(ret)
!
! L(nul):   mov -16(ap), %rax
!   mov %r12, -24(rp)
mul %r8
!   add %rax, %r10
!   mov -16(bp), %rax
mov $0, R32(%r12)
adc %rdx, %r12
mul %r9
!   add %rax, %r10
mov -8(ap), %rax
!   adc %rdx, %r12
!   mov %r10, -16(rp)
mul %r8
!   add %rax, %r12
mov -8(bp), %rax
!   mov $0, R32(%r10)
!   adc %rdx, %r10
mul %r9
add %rax, %r12
adc %rdx, %r10
+
+ L(two):   mov %r12, -8(rp)
mov %r10, %rax
! L(ret):   pop %rbp
pop %r12
+   FUNC_EXIT()
ret
  EPILOGUE()
8<-

Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: div_qr_1n_pi1

2021-06-14 Thread Marco Bodrato

Ciao,

Il 2021-06-06 22:16 Torbjörn Granlund ha scritto:

ni...@lysator.liu.se (Niels Möller) writes:

  Maybe we should have some macrology for that? Or do all relevant
  processors and compilers support efficient cmov these days? I'm 
sticking

  to masking expressions for now.

Let's not trust results from compiler generated code for these things.
The mixture of inline asm and plain code is hard for compilers to deal
with.  Very subtle things can make a huge cycle count difference.


Of course, mixing asm and plain code will not let the compiler much 
freedom...


Should we try if the compiler supports a larger type (e.g. unsigned 
__int128) and define the common macros add_ss and umul_ppmm based on 
it? In that case the compiler should be able to optimise also across the 
longlong-defined operations.


Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Error handler for GMP?

2021-06-12 Thread Marco Bodrato

Ciao John! È un piacere vederti da queste parti!

Il 2021-03-22 09:55 abb...@dima.unige.it ha scritto:
Does GMP offer a way to return/throw rather than simply aborting upon  
overflow?


No, it doesn't. Yet.


I could not see anything relevant in the documentation.


The theme emerges every now and then on the list.

And the problem is that it seems difficult, for the caller, to handle 
possible memory leaks, and coherence of the internal state to do 
anything better than to abort...


But I think that lazy allocation, we have it from GMP-6.2.0, can help.

I think it should be possible to write a function, using mpz_t or mpq_t 
only, that:


1) has a clear distinction between readable-only and writable-only 
parameters.
2) frees (mpz_clear) all the writable variables, and re-inits them 
(mpz_init, which is lazy now, and does not allocate anything).
3) sets (mp_set_memory_functions) memory functions that will keep track 
of any block allocated by GMP functions

4) sets the callback on error...
5) does the heavy work
6) resets callback and memory functions and returns "done".

If during the step 5 an error occours:
6) frees any memory block allocated after step 3;
7) re-inits all the writable variables (mpz_init, to set everything to a 
coherent state)

8) resets callback and memory functions and returns "fail".

Anyway, for internal coherence, I think that we should at least check 
that all the functions in the library that might "abort" do this using 
an "exception". For this, I propose the attached patch.


Then, we might think if we want to let the user install a hook on the 
__gmp_exception function.


Ĝis,
mdiff -r eac10a64bc99 errno.c
--- a/errno.c	Sun Jun 06 23:29:28 2021 +0200
+++ b/errno.c	Sat Jun 12 16:46:13 2021 +0200
@@ -70,3 +70,8 @@
 {
   __gmp_exception (GMP_ERROR_DIVISION_BY_ZERO);
 }
+void
+__gmp_overflow_in_mpz (void)
+{
+  __gmp_exception (GMP_ERROR_MPZ_OVERFLOW);
+}
diff -r eac10a64bc99 gmp-h.in
--- a/gmp-h.in	Sun Jun 06 23:29:28 2021 +0200
+++ b/gmp-h.in	Sat Jun 12 16:46:13 2021 +0200
@@ -2325,7 +2325,8 @@
   GMP_ERROR_UNSUPPORTED_ARGUMENT = 1,
   GMP_ERROR_DIVISION_BY_ZERO = 2,
   GMP_ERROR_SQRT_OF_NEGATIVE = 4,
-  GMP_ERROR_INVALID_ARGUMENT = 8
+  GMP_ERROR_INVALID_ARGUMENT = 8,
+  GMP_ERROR_MPZ_OVERFLOW = 16
 };
 
 /* Define CC and CFLAGS which were used to build this version of GMP */
diff -r eac10a64bc99 gmp-impl.h
--- a/gmp-impl.h	Sun Jun 06 23:29:28 2021 +0200
+++ b/gmp-impl.h	Sat Jun 12 16:46:13 2021 +0200
@@ -3924,10 +3924,12 @@
 __GMP_DECLSPEC void __gmp_exception (int) ATTRIBUTE_NORETURN;
 __GMP_DECLSPEC void __gmp_divide_by_zero (void) ATTRIBUTE_NORETURN;
 __GMP_DECLSPEC void __gmp_sqrt_of_negative (void) ATTRIBUTE_NORETURN;
+__GMP_DECLSPEC void __gmp_overflow_in_mpz (void) ATTRIBUTE_NORETURN;
 __GMP_DECLSPEC void __gmp_invalid_operation (void) ATTRIBUTE_NORETURN;
 #define GMP_ERROR(code)   __gmp_exception (code)
 #define DIVIDE_BY_ZERO__gmp_divide_by_zero ()
 #define SQRT_OF_NEGATIVE  __gmp_sqrt_of_negative ()
+#define MPZ_OVERFLOW  __gmp_overflow_in_mpz ()
 
 #if defined _LONG_LONG_LIMB
 #define CNST_LIMB(C) ((mp_limb_t) C##LL)
diff -r eac10a64bc99 mpz/init2.c
--- a/mpz/init2.c	Sun Jun 06 23:29:28 2021 +0200
+++ b/mpz/init2.c	Sat Jun 12 16:46:13 2021 +0200
@@ -43,10 +43,7 @@
   if (sizeof (unsigned long) > sizeof (int)) /* param vs _mp_size field */
 {
   if (UNLIKELY (new_alloc > INT_MAX))
-	{
-	  fprintf (stderr, "gmp: overflow in mpz type\n");
-	  abort ();
-	}
+	MPZ_OVERFLOW;
 }
 
   PTR(x) = __GMP_ALLOCATE_FUNC_LIMBS (new_alloc);
diff -r eac10a64bc99 mpz/realloc.c
--- a/mpz/realloc.c	Sun Jun 06 23:29:28 2021 +0200
+++ b/mpz/realloc.c	Sat Jun 12 16:46:13 2021 +0200
@@ -44,18 +44,12 @@
   if (sizeof (mp_size_t) == sizeof (int))
 {
   if (UNLIKELY (new_alloc > ULONG_MAX / GMP_NUMB_BITS))
-	{
-	  fprintf (stderr, "gmp: overflow in mpz type\n");
-	  abort ();
-	}
+	MPZ_OVERFLOW;
 }
   else
 {
   if (UNLIKELY (new_alloc > INT_MAX))
-	{
-	  fprintf (stderr, "gmp: overflow in mpz type\n");
-	  abort ();
-	}
+	MPZ_OVERFLOW;
 }
 
   if (ALLOC (m) == 0)
diff -r eac10a64bc99 mpz/realloc2.c
--- a/mpz/realloc2.c	Sun Jun 06 23:29:28 2021 +0200
+++ b/mpz/realloc2.c	Sat Jun 12 16:46:13 2021 +0200
@@ -43,10 +43,7 @@
   if (sizeof (unsigned long) > sizeof (int)) /* param vs _mp_size field */
 {
   if (UNLIKELY (new_alloc > INT_MAX))
-	{
-	  fprintf (stderr, "gmp: overflow in mpz type\n");
-	  abort ();
-	}
+	MPZ_OVERFLOW;
 }
 
   if (ALLOC (m) == 0)
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: pointers vs arrays

2021-06-09 Thread Marco Bodrato

Ciao,

Il 2021-05-12 15:17 Marc Glisse ha scritto:

the latest version of gcc has a new (questionable) warning that
complains if a function is declared as taking a mpz_t parameter and
redeclared as taking mpz_ptr, or the reverse.

We might as well be consistent and use pointers everywhere, as in the
attached patch. Does someone disagree?


Maybe this will move the warnings on the users side :-/
By the way, I think that also the documentation should be updated 
accordingly.


Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: div_qr_1n_pi1

2021-06-03 Thread Marco Bodrato

Ciao,

Il 2021-06-03 12:40 Torbjörn Granlund ha scritto:

If we dare use cmov (and its presumed side-channel leakage) we could
probably shorten the critical path by a cycle.  The "sbb" and "and"
would go away.


Using masks does not always give the fastest code. I tried the following 
variation on Niels' code, and, on my laptop with "g++-10 -O2 
-mtune=icelake-client -march=icelake-client", the resulting code is 
comparable (faster?) with the current asm.


*** mpn_div_qr_1n_pi1 (mp_ptr qp, mp_srcptr
*** 245,266 
 * +  | q0|
 *   -+---+---+---+
 *| q2| q1| q0|
 *+---+---+---+
*/
!   umul_ppmm (p1, t, u1, dinv);
!   add_ss (q2, q1, -u2, u2 & dinv, CNST_LIMB(0), u1);
!   add_ss (q2, q1, q2, q1, CNST_LIMB(0), p1);
!   add_ss (q2, q1, q2, q1, CNST_LIMB(0), q0);
!   q0 = t;

umul_ppmm (p1, p0, u1, B2);
-   ADDC_LIMB (cy, u0, u0, u2 & B2);
-   u0 -= (-cy) & d;

/* Final q update */
!   add_ss (q2, q1, q2, q1, CNST_LIMB(0), cy);
qp[j+1] = q1;
MPN_INCR_U (qp+j+2, n-j-2, q2);

add_mss (u2, u1, u0, u0, up[j], p1, p0);
  }
--- 245,264 
 * +  | q0|
 *   -+---+---+---+
 *| q2| q1| q0|
 *+---+---+---+
*/
!   ADDC_LIMB (q2, q1, q0, u1);
!   umul_ppmm (t, q0, u1, dinv);
!   ADDC_LIMB (cy, u0, u0, u2 ? B2 : 0);
!   u0 -= cy ? d : 0;
!   add_ss (q2, q1, q2, q1, -u2, u2 ? dinv : 0);

umul_ppmm (p1, p0, u1, B2);

/* Final q update */
!   add_ss (q2, q1, q2, q1, CNST_LIMB(0), t + cy);
qp[j+1] = q1;
MPN_INCR_U (qp+j+2, n-j-2, q2);

add_mss (u2, u1, u0, u0, up[j], p1, p0);
  }

$ build/tune/speed -p1000 -s1-100 -f1.6 -C 
mpn_div_qr_1n_pi1.999 ...

   ASM-codeC-code
1  2.1227   1  3.6125
2  3.1758   2  3.9425
3  3.4567   3  3.8861
4  3.4758   4  3.8606
6  3.7857   6  3.8764
9  3.9912   9  3.9676
14 4.0304   14 4.0531
22 4.3461   22 4.1798
35 4.4161   35 4.2080
56 4.4744   56 4.2833
89 4.4896   89 4.2950


(I am a bit fixated with side-channel leakage; our present
implementations of these particular functions are not side-channel
silent.)


We should write a fast version, and then a sec_ one :-)

Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Side channel silent karatsuba / mpn_addmul_2 karatsuba

2020-12-08 Thread Marco Bodrato

Ciao,

since someone is asking about secure powm... I reply to an old message 
:-)


Il 2018-12-13 07:05 ni...@lysator.liu.se ha scritto:

t...@gmplib.org (Torbjörn Granlund) writes:

I am looking into doing karatsuba multiplication at evaluation points 
0,

+1, and infinity.  We usually evaluate in -1 instead of +1.


It will be very interesting to see what thresholds we get with that.


At a first glance, I'd say around a dozen limbs higher than the non-sec 
thresholds.


You can try, by replacing the two files mpn/generic/toom2{2_mul,_sqr}.c 
with the attached ones. Then make check and make tune, to see what 
happens.


Of course the two functions should not replace the current ones, but 
that was the easiest way to test the new files :-)


The main advantage of evaluating in +1 is that it makes it more 
straight-
forward to avoid side channel leakage.  (Currently, we completely 
avoid
all o(n^2) multiplication algorithms in side channel sensitive 
contexts.)


I don't think we should rule out using -1. It needs side-channel silent
conditional negation, but that's not so hard. sec_invert.c implements


The attached code for squaring, uses -1.
I.e. it is strictly equivalent to the non-sec version. I only replaced 
the strategy to obtain the absolute value of the difference, and carry 
propagation. It also has exactly the same memory usage.



mpn_cnd_neg in in terms of mpn_lshift and mpn_cnd_sub_n, one could also


I'm recycling your code here, in mpn_sec_absub.


What's most efficient is not at all clear to me: negation costs O(n),
but so does handling of the extra top bits one get with evaluation in
+1.


That's true only for the last recursion level, when you fall into 
_basecase...


I know that toom2_sqr enters the game later wrt toom22_mul. But in the 
_sec_ area, to compute the square of a number we use sqr_basecase if the 
number is small enough, and mul_basecase for larger integers... so that 
a Karatsuba function may be more important for squaring than for the 
product...


Ĝis,
m/* mpn_sec_toom2_sqr -- Square {ap,an}.

   Contributed to the GNU project by Torbjorn Granlund.
   Modified by Marco BODRATO to obtain a side-channel silent version.

Copyright 2006-2010, 2012, 2014, 2018, 2020 Free Software Foundation, Inc.

This code is free software; you can redistribute it and/or modify it
under the terms of the GNU Affero General Public License as published
by the Free Software Foundation; either version 3 of the License, or
(at your option) any later version.

You should have received a copy of the GNU Affero General Public
License.  If not, see https://www.gnu.org/licenses/.  */

#include "gmp-impl.h"

static mp_limb_t
mpn_sec_add (mp_ptr rp,
	 mp_srcptr ap, mp_size_t an,
	 mp_srcptr bp, mp_size_t bn,
	 mp_ptr tp)
{
  mp_limb_t cy;

  ASSERT (an >= bn);

  cy = mpn_add_n (rp, ap, bp, bn);
  if (an != bn)
cy = mpn_sec_add_1 (rp + bn, ap + bn, an - bn, cy, tp);
  return cy;
}

static mp_size_t
mpn_sec_add_itch (mp_size_t an, mp_size_t bn)
{
  ASSERT (an >= bn);
  return mpn_sec_add_1_itch (an - bn);
}

static mp_limb_t
mpn_sec_sub (mp_ptr rp,
	 mp_srcptr ap, mp_size_t an,
	 mp_srcptr bp, mp_size_t bn,
	 mp_ptr tp)
{
  mp_limb_t bw;

  ASSERT (an >= bn);

  bw = mpn_sub_n (rp, ap, bp, bn);
  if (an != bn)
bw = mpn_sec_sub_1 (rp + bn, ap + bn, an - bn, bw, tp);
  return bw;
}

static mp_size_t
mpn_sec_sub_itch (mp_size_t an, mp_size_t bn)
{
  ASSERT (an >= bn);
  return mpn_sec_sub_1_itch (an - bn);
}

static mp_limb_t
mpn_sec_absub (mp_ptr rp,
	   mp_srcptr ap, mp_size_t an,
	   mp_srcptr bp, mp_size_t bn,
	   mp_ptr tp)
{
  mp_limb_t bw;

  ASSERT (an >= bn);

  bw = mpn_sec_sub (rp, ap, an, bp, bn, tp);

  /* mpn_cnd_neg_ip (bw, rp, n, tp); */
#if 0
  mpn_sec_sub_1 (rp, rp, an, bw, tp);
  MPN_FILL (tp, an, -bw);
  mpn_xor_n (rp, rp, tp, an);
#else
  mpn_lshift (tp, rp, an, 1);
  mpn_cnd_sub_n (bw, rp, rp, tp, an);
#endif

  return bw;
}

static mp_size_t
mpn_sec_absub_itch (mp_size_t an, mp_size_t bn)
{
  ASSERT (an >= mpn_sec_sub_itch (an, bn));
  ASSERT (an >= mpn_sec_sub_1_itch (an));
  return an;
}

/* Evaluate in: -1, 0, +inf

  <-s--><--n-->
    __
  |_a1_|___a0_|

  v0  =  a0 ^2  #   A(0)^2
  vm1 = (a0- a1)^2  #  A(-1)^2
  vinf=  a1 ^2  # A(inf)^2
*/

#if TUNE_PROGRAM_BUILD || WANT_FAT_BINARY
#define MAYBE_sqr_toom2   1
#else
#define MAYBE_sqr_toom2			\
  (SQR_TOOM3_THRESHOLD >= 2 * SQR_TOOM2_THRESHOLD)
#endif

#define TOOM2_SQR_REC(p, a, n, ws)	\
  do {	\
if (! MAYBE_sqr_toom2		\
	|| BELOW_THRESHOLD (n, SQR_TOOM2_THRESHOLD))			\
  mpn_sqr_basecase (p, a, n);	\
else\
  mpn_toom2_sqr (p, a, n, ws);	\
  } while (0)

void
mpn_toom2_sqr (mp_ptr pp,
	   mp_srcptr ap, mp_size_t an,
	   mp_ptr scratch)
{
  const int __gmpn_cpuvec_initialized = 1;
  mp_size_t n, s;
  mp_limb_t cy, cy2;
  mp_ptr asm1;

#define a0  ap
#define a1  (a

Re: mpz_prevprime

2020-11-28 Thread Marco Bodrato

Ciao,

Il 2020-11-16 21:23 Seth Troisi ha scritto:

Thanks Marco! I'm really happy this is merged.


Your code was merged with a storm of new code that has been reversed. 
But now it's in again!



Was there any specific things you thought could be improved? Happy to
look at those too.


Memory usage, in my opinion, is not optimal.

The current code generate a primesieve, to just loop on it once for 
building a list of gaps, that is usually used just once... I'm still not 
completely convinced that this structure is the best we can do.


In the meanwhile I wander if a further modularization could allow us to 
build a type to store the "current state" and functions to sequentially 
find the next primes...


Ĝis,
m

___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: mpz_prevprime

2020-11-16 Thread Marco Bodrato

Ciao,

Il 2020-10-16 09:51 Seth Troisi ha scritto:

I included a patch with the rename but not including Marco's tdiv


I pushed most of the changes you proposed.

I only removed some portion of the proposed tests/mpz/t-nextprime.c.
You are right, also the return value of the new function should be 
tested, but I think that we should find a way to check both the result 
and the return value of single runs of the function. E.g. test_largegap 
can check that mpz_prevprime always returns 1 or 2, and run_p may take 
another parameter so that we can check (depending on the range) if 2 is 
returned as expected, or 1 is enough...


Let's see if we can improve it even more :-)

Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: 6.2.0 build failure on x86_64 Skylake PC - FIX

2020-10-25 Thread Marco Bodrato

Ciao,

Il 2020-10-15 07:14 Marco Bodrato ha scritto:

we are not changing much the gmp-6.2 branch, and we should release.

Is there something important we are missing as a bug fix?


I searched the gmp-devel mailing list and we missed to take any action 
after the following, about msys. It seems that we currently are not 
handling *-*-msys... is someone of the developers able to test if the 
proposed change is useful? Or something more is needed as Marc Glisse 
suggests?


Il 2020-06-08 21:52 Marc Glisse ha scritto:

On Sat, 30 May 2020, tsurzin wrote:

This change worked to build test and run a version of GMP-6.2.0 for my 
PC.


[handle *-*-msys exactly the same as *-*-mingw* in configure.ac]

The PC is running Msys2 under Windows 10 and without change GMP failed 
to build.


configfsf.guess does mention a triple *-pc-msys, so I guess it makes
sense to handle it (or replace it with something we already handle). I
don't really know in what ways it differs from mingw, probably not
that much as far as GMP is concerned.

I notice in a generated file:

aclocal.m4:  *-*-mingw* ) # actually msys

Should automake also be taught about the msys triple?


(the thread is available at 
https://gmplib.org/list-archives/gmp-bugs/2020-May/thread.html )


Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: mpz_prevprime

2020-10-16 Thread Marco Bodrato

Ciao,

Il 2020-10-16 09:51 Seth Troisi ha scritto:

won't this cause m = 2*prime, instead of the original result, m = 0?
I used m = (diff > 0 && m) ? prime - m : m;
to make sure that p can be marked as composite.


Oops, you are right!

I fear that the short-circuit evaluation of && can force the compiler to 
branch...

(diff > 0) & (m != 0) should be computed branchless...
But that's a stupid detail, as you said most of the time is used in the 
mod function or in millerrabin :-)


Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: mpz_prevprime

2020-10-16 Thread Marco Bodrato
Post scriptum:

Il Ven, 16 Ottobre 2020 7:13 am, Marco Bodrato ha scritto:
>m = mpz_tdiv_ui(p, prime);
>m = (diff > 0) ? prime - m : m;

I remember that, in this context, if p = 0 (mod prime), the result m =
prime is as good as the result m = 0. Because the next two lines are:
 if (m & 1)
   m += prime;

Ĝis,
m

___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: mpz_prevprime

2020-10-15 Thread Marco Bodrato
Ciao,

Il Ven, 16 Ottobre 2020 1:47 am, Seth Troisi ha scritto:
> On Thu, Oct 15, 2020 at 10:44 AM Niels Möller 
>> Seth Troisi  writes:
>> > Technically I can restructure the code to make both use with
>> mpz_fdiv_ui

>> >  static int
>> >  findnext (mpz_ptr p,
>> > -  unsigned long(*nextmod_func)(const mpz_t, unsigned long),
>> > +  char direction,
>> >void(*nextseq_func)(mpz_t, const mpz_t, unsigned long))

>> I see. I don't think it's worth taking out a function pointer if it's
>> replaced with a flag and a couple of if statements. It would be neat if
>> it could be done similarly to findnext_small

Well, both mpz_fdiv and mpz_cdiv have if statements in their code, to
differently handle positive and negative numbers.
We can avoid those branches using tdiv, and add one here (the compiler may
use a cmov). I mean:

/* we have 0 < prime <= 15001 < 2^28, because of calculate_sievelimit */
#if GMP_NUMB_BITS >= 28
   m = mpn_mod_1 (PTR (p), SIZ (p), (mp_limb_t) prime);
#else
   m = mpz_tdiv_ui(p, prime);
#endif
   m = (diff > 0) ? prime - m : m;

>> with diff being +2 or -2 (and possibly generalized further), and no
>> conditionals, e.g.,
>>
>>   mpz_add_si (p, p, diff * (odds_in_composite_sieve-1));

We don't have such a function :-) But adding a static function like that
here, assuming that p is always positive and greater than |si|, with a
single branch... can be a good idea.

Ĝis,
m

___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: mpz_prevprime

2020-10-03 Thread Marco Bodrato

Ciao,

Il 2020-10-03 03:58 Seth Troisi ha scritto:

On Thu, Mar 26, 2020 at 2:00 PM Marco Bodrato
Il 2020-03-25 02:25 Seth Troisi ha scritto:
+ t = diff > 0 ? ((t + 1) | (t > 1)) :
+ ((t == 3) ? 2 : ((t - 2) | 1));

Maybe move this to the caller side? Or partially, leaving here just
ASSERT (t >= 2);
t |= (t != 2);

I moved this to the caller side (so that both findnext and


Really? :-) But it's ok anyway.


I would also check many gaps around 2^{16n}, to check if everything
works correctly when the search crosses the limb boundaries.
Maybe structuring the test with a table {"number(base16?)", gap},
i.e.
with (also) something like:
{"65521", 16},
{"4294967291", 20},
{"281474976710597", 80},
{"18446744073709551557", 72},
{"1208925819614629174706111", 78},
{"79228162514264337593543950319", 78},
{"5192296858534827628530496329220021", 100},
{"340282366920938463463374607431768211297", 210},

I did this (using hex form) I threw in some 16*n-1 also


I'd really add some more tests around boundaries... for both next and 
prec.


Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: mpz_prevprime

2020-10-03 Thread Marco Bodrato

Ciao,

Il 2020-10-03 03:58 Seth Troisi ha scritto:

I modified the patch a tiny bit. Still hoping to get this in for an


I think that the patch is interesting: a function for searching primes 
backward in the sequence of integers is missing and seems useful.


The proposed interface is the following.


@deftypefun int mpz_prevprime (mpz_t @var{rop}, const mpz_t @var{op})
@cindex Previous prime function
Set @var{rop} to the greatest prime less than @var{op}.


If a previous prime doesn't exist (i.e. @var{op} < 3), rop is unchanged 
and

0 is returned.


Return 1 if @var{rop} is a probably prime, and 2 if @var{rop} is 
definitely

prime.


I personally do not like the idea that a previous prime can "not exist", 
because in my opinion -2 is a prime, and there are as many negative 
primes as there are positive ones... The function mpz_probab_prime_p in 
our library agrees with my opinion... but... ok, that's my opinion only.


Anyway, the return value is used also to return something more 
interesting: 1 or 2, with the same meaning that the return value has for 
the function mpz_probab_prime_p.

Should we add a return value also to the function mpz_nextprime?

Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: mpn_set_str_bits

2020-09-30 Thread Marco Bodrato

Ciao,

Il 2020-09-30 19:37 ni...@lysator.liu.se ha scritto:

Marco Bodrato  writes:


  limb = (unsigned int) sp[sn] >> (bits - shift);


That's easier to read than what I proposed.



Maybe worth a comment mentioning the problem case: mp_limb_t


Thanks Niels!

So, here is the current proposed rewrite:

static mp_size_t
mpn_set_str_bits (mp_ptr rp, const unsigned char *sp, size_t sn,
  unsigned bits)
{
  mp_size_t rn;
  mp_limb_t limb;
  unsigned shift;

  for (limb = 0, rn = 0, shift = 0; sn-- > 0; )
{
  limb |= (mp_limb_t) sp[sn] << shift;
  shift += bits;
  if (shift >= GMP_LIMB_BITS)
{
  shift -= GMP_LIMB_BITS;
  rp[rn++] = limb;
  /* Next line is correct also if shift == 0,
 bits == 8, and mp_limb_t == unsigned char. */
  limb = (unsigned int) sp[sn] >> (bits - shift);
}
}
  if (limb != 0)
rp[rn++] = limb;
  else
rn = mpn_normalized_size (rp, rn);
  return rn;
}

It seems simple enough. Any further comment?

Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: mpn_set_str_bits

2020-09-30 Thread Marco Bodrato

Ciao!

Il 2020-09-30 09:03 ni...@lysator.liu.se ha scritto:

Marco Bodrato  writes:

  limb = sp[sn];
  if (GMP_LIMB_BITS > CHAR_BIT || shift > 0)
limb >>= bits - shift;
  else
limb = 0;



Do we really need to support bits == GMP_LIMB_BITS here? If not, the


About mpn_set_str, the manual reads "base can vary from 2 to 256". 
Moreover we are fool enough to (unofficially) support "typedef unsigned 
char mp_limb_t;" in mini-gmp...



above 5 lines could be simplified to just



  limb = (sp[sn] >> 1) >> (bits - 1 - shift);

it should be safe in all cases.


I agree.

Anyway, there may be a problem only if we shift sp[sn] (unsigned char) 
or limb (possibly the same type) by 8 bits (if bits=8 and shift=0).


An even simpler solution could be to cast to unsigned int (at least 16 
bits, right?) so:


  limb = (unsigned int) sp[sn] >> (bits - shift);

Maybe the cast is unnecessary, because the C standard says that the left 
operand of a shift operation is always promoted to (unsigned) int if it 
is smaller, correct? But it shouldn't be dangerous.


Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: mpn_set_str_bits

2020-09-29 Thread Marco Bodrato

Ciao,

Il 2020-09-29 12:15 Raphael Rieu-Helft ha scritto:

The attached patch slightly improves the mini-gmp function
mpn_set_str_bits. The invariants are also a bit clearer (shift is the


The loop in mpz_import uses another strategy, a temporary limb.
This reduces the number of write operations into memory.

May I propose an alternative rewrite?
What do you think about it?

static mp_size_t
mpn_set_str_bits (mp_ptr rp, const unsigned char *sp, size_t sn,
  unsigned bits)
{
  mp_size_t rn;
  mp_limb_t limb;
  unsigned shift;

  for (limb = 0, rn = 0, shift = 0; sn-- > 0; )
{
  limb |= (mp_limb_t) sp[sn] << shift;
  shift += bits;
  if (shift >= GMP_LIMB_BITS)
{
  rp[rn++] = limb;
  shift -= GMP_LIMB_BITS;
  limb = sp[sn];
  if (GMP_LIMB_BITS > CHAR_BIT || shift > 0)
limb >>= bits - shift;
  else
limb = 0;
}
}
  if (limb != 0)
rp[rn++] = limb;
  else
rn = mpn_normalized_size (rp, rn);
  return rn;
}

Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: State of PRNG code in GMP

2020-06-02 Thread Marco Bodrato

Ciao,

Il 2020-06-02 11:22 t...@gmplib.org ha scritto:

I'd like to have a coherent interface which also provide reentrant
random functions to the mpn layer.  And in no case should mpn pull in
mpz.


Makes sense.


Unfortunately, Mersenne Twister uses mpz, and I am not saying that that
was a bad choice when you implemented it.  But in order to provide
reentrant mpn PRNG functions, we either rewrite the relevant parts of 
MT

to use mpn, or exclude it from a new mpn PRNG interface.


Mersenne Twister only uses mpz for initialization. Moreover there is a 
little "bug" in the initialization procedure, so that the sequence can 
be the same even if the seed is different (in the range qhere it is 
supposed to generate different sequencese).


That's wht some years ago we started rewriting that init function.
Of course this will yield to different sequences too.

https://gmplib.org/list-archives/gmp-bugs/2017-March/004106.html

Here is the almost mpz-free init function using the xxtea scrambler.



#define MX (((z >> 5 ^ y << 2) + (y >> 3 ^ z << 4)) \
^ ((sum ^ y) + (k[(p & 3) ^ e] ^ z)))
#define DELTA 0x9E3779B9

static void
mangle_buf (gmp_uint_least32_t *buf, int high_bit)
{
  /* Apply XXTEA decryption with a constant key.
 XXTEA offers good randomness and speed properties.  */
  static const gmp_uint_least32_t key[8] = {0x4BEDAF6D, 0x674DD5FB, 
0xB79D42BC, 0x94C371EA,

0x5443092C, 0xA67C9FE2, 0x31CC686A, 
0xC41175D6};
  const gmp_uint_least32_t *k;
  gmp_uint_least32_t z;
  gmp_uint_least32_t y = buf[0] & 0x;
  gmp_uint_least32_t sum = ((DELTA << 1) + DELTA) << 1; /* DELTA*6 */
  int p, e;
  k = high_bit ? key + 4 : key;
  do {
e = sum >> 2 & 3;
p = N - 2;
do {
  z = buf[p - 1] & 0x;
  y = (buf[p] -= MX) & 0x;
} while (--p);
z = buf[N - 2] & 0x;
y = (buf[0] -= MX) & 0x;
  } while (sum -= DELTA);
}

/* Seeding function.  */
static void
xxtea_randseed_mt (gmp_randstate_t rstate, mpz_srcptr seed)
{
  int i, high_bit;
  size_t cnt;

  gmp_rand_mt_struct *p;
  mpz_t seed1;  /* Intermediate result.  */
  mp_limb_t mp_d[BITS_TO_LIMBS (19936)];

  high_bit = mpz_tstbit (seed, 19936);
  p = (gmp_rand_mt_struct *) RNG_STATE (rstate);

  PTR (seed1) = mp_d;
  ALLOC (seed1) = BITS_TO_LIMBS (19936);
  SIZ (seed1) = 0;

  mpz_fdiv_r_2exp (seed1, seed, 19936);
  mpz_export (>mt[1], , -1, sizeof (p->mt[1]), 0,
  CHAR_BIT * sizeof (p->mt[1]) - 32, seed1);

  cnt++;
  ASSERT (cnt <= N);
  while (cnt < N)
p->mt[cnt++] = 0;

  mangle_buf (>mt[1], high_bit);	/* Perform the mangling of the 
buffer. */


  p->mt[0] = 0x8000; /* Set the first bit. */
  if (!high_bit) {
cnt = N - 1;
do {
  if (p->mt[cnt] != 0) {
p->mt[0] = 0;/* Unset the first bit. */
break;
  }
} while (--cnt != 0);
  }

  /* Warm the generator up if necessary.  */
  if (WARM_UP != 0)
for (i = 0; i < WARM_UP / N; i++)
  __gmp_mt_recalc_buffer (p->mt);

  p->mti = WARM_UP % N;
}



Here is some code I have tinkered with.



typedef enum {
  GMP_PRNG_ALG_LC,  GMP_PRNG_ALG_MT,  GMP_PRNG_ALG_AES,
  GMP_PRNG_ALG_LIM,  GMP_PRNG_ALG_DEFAULT = GMP_PRNG_ALG_AES
} gmp_prng_alg;


What's LIM?

Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: mini-gmp warnings

2020-04-26 Thread Marco Bodrato

Il 2020-04-26 16:22 ni...@lysator.liu.se ha scritto:

Is there an easy way to run mini-gmp tests with
small limb size?


In mini-gmp/mini-gmp.h we have the following lines:

#ifndef MINI_GMP_LIMB_TYPE
#define MINI_GMP_LIMB_TYPE long
#endif

typedef unsigned MINI_GMP_LIMB_TYPE mp_limb_t;

So, you should define MINI_GMP_LIMB_TYPE to something like int, short, 
or char.

The following line works in my environment:

make CPPFLAGS="-DMINI_GMP_LIMB_TYPE=char" check-mini-gmp

Ĝis,
m

___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: mini-gmp warnings

2020-04-26 Thread Marco Bodrato

Ciao,

Il 2020-04-26 15:11 ni...@lysator.liu.se ha scritto:

ni...@lysator.liu.se (Niels Möller) writes:

There are a couple of other uses of LOCAL_SHIFT_BITS,
LOCAL_GMP_LIMB_BITS, LOCAL_CHAR_BIT, added as part of the support for
testing with small limb sizes. Is it for some reason important to 
refer



Now I've tried changing it, and it seems to also be a hack to avoid
warnings?


Exactly.


In this macro,

#define gmp_umul_ppmm(w1, w0, u, v) \
  do {  \
int LOCAL_GMP_LIMB_BITS = GMP_LIMB_BITS;\
if (sizeof(unsigned int) * CHAR_BIT >= 2 * GMP_LIMB_BITS)\
  { \
unsigned int __ww = (unsigned int) (u) * (v);   \
w0 = (mp_limb_t) __ww;  \
w1 = (mp_limb_t) (__ww >> LOCAL_GMP_LIMB_BITS);   \
  } \

if I change the shift to

w1 = (mp_limb_t) (__ww >> GMP_LIMB_BITS); \


Yes, and even more warning are triggered if mp_limb_t is unsigned 
char...


The idea of the hack is: if the compiler is not optimising, a 
non-constant shift is compiled; if the compiler optimises, a "variable" 
with a non-changing value should be removed, and the branch should be 
totally skipped.


I agree with your first proposal: use unsigned. It should be good even 
with -O0.


Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: mini-gmp "bug" missing mpz_fits_sint_p / mpz_fits_uint_p

2020-04-20 Thread Marco Bodrato

Ciao,

Il 2020-04-20 11:08 Vincent Lefevre ha scritto:

I think that in general, you should not write code that depends on
whether INT_MAX + INT_MIN == 0 or not (the constant INT_MAX + INT_MIN
might be useful in some rare cases, but I think that testing whether
this constant is 0 or not should be forbidden). This can mean that


Forbidden? Really! :-D

Anyway, using the numerical constant INT_MAX + INT_MIN is a good idea.

What about the following version of the function? :-D

int
mpz_fits_sint_p (const mpz_t u)
{
  return mpz_cmpabs_ui (u, (unsigned long) INT_MAX + (unsigned long)
(u->_mp_size < 0 ? -(INT_MAX + INT_MIN) : 0)) <= 0;
}

By the way, jokes apart, Niels is right: we do not need to optimise this 
function.
Please Niels, choose the implementation you prefer and change also 
mpz_fits_slong_p accordingly.


Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: mini-gmp "bug" missing mpz_fits_sint_p / mpz_fits_uint_p

2020-04-19 Thread Marco Bodrato

Il 2020-04-19 10:44 ni...@lysator.liu.se ha scritto:

Marco Bodrato  writes:


+int
+mpz_fits_sint_p (const mpz_t u)
+{
+  return (INT_MAX + INT_MIN == 0 || mpz_cmp_ui (u, INT_MAX) <= 0) &&
+mpz_cmpabs_ui (u, GMP_NEG_CAST (unsigned long int, INT_MIN)) <= 
0;

+}


I think this and mpz_fits_sshort_p would be simpler using mpz_cmp_si,

int
mpz_fits_sint_p (const mpz_t u)
{
  return mpz_cmp_si (u, INT_MAX) <= 0 && mpz_cmp_si (i, INT_MIN) >= 0;
}


The current implementation of _cmp_si in mini- is:

mpz_cmp_si (const mpz_t u, long v)
{
  mp_size_t usize = u->_mp_size;

  if (v >= 0)
return mpz_cmp_ui (u, v);
  else if (usize >= 0)
return 1;
  else
return - mpz_cmpabs_ui (u, GMP_NEG_CAST (unsigned long int, v));
}

So that the compiler may optimise your code to:

return mpz_cmp_ui (u, INT_MAX) <= 0 &&
   (u->_mp_size >= 0 ||
 mpz_cmpabs_ui (u, GMP_NEG_CAST (unsigned long int, INT_MIN)) <= 
0) ;


which basically is the condition I wrote, with one more branch.

BTW, do we have any C implementation where INT_MAX + INT_MIN == 0, 
i.e.,

not using two's complement?


I'm almost sure the compiler can optimise that out at compile time.

Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: mini-gmp "bug" missing mpz_fits_sint_p / mpz_fits_uint_p

2020-04-19 Thread Marco Bodrato

Ciao,

On the gmp-discuss list,

Il 2020-04-11 21:41 Simon Sobisch ha scritto:

mini-gmp provides mpz_fits_slong_p ad  mpz_fits_uslingt_p, but it does
not provide the same for smaller integer types.


We can easily add the requested functions. I suggest the following code:

diff -r 805304ca965a mini-gmp/mini-gmp.c
--- a/mini-gmp/mini-gmp.c   Tue Mar 24 23:13:28 2020 +0100
+++ b/mini-gmp/mini-gmp.c   Sun Apr 19 09:47:39 2020 +0200
@@ -1565,6 +1565,32 @@
   return us >= 0 && mpn_absfits_ulong_p (u->_mp_d, us);
 }

+int
+mpz_fits_sint_p (const mpz_t u)
+{
+  return (INT_MAX + INT_MIN == 0 || mpz_cmp_ui (u, INT_MAX) <= 0) &&
+mpz_cmpabs_ui (u, GMP_NEG_CAST (unsigned long int, INT_MIN)) <= 0;
+}
+
+int
+mpz_fits_uint_p (const mpz_t u)
+{
+  return u->_mp_size >= 0 && mpz_cmpabs_ui (u, UINT_MAX) <= 0;
+}
+
+int
+mpz_fits_sshort_p (const mpz_t u)
+{
+  return (SHRT_MAX + SHRT_MIN == 0 || mpz_cmp_ui (u, SHRT_MAX) <= 0) &&
+mpz_cmpabs_ui (u, GMP_NEG_CAST (unsigned long int, SHRT_MIN)) <= 0;
+}
+
+int
+mpz_fits_ushort_p (const mpz_t u)
+{
+  return u->_mp_size >= 0 && mpz_cmpabs_ui (u, USHRT_MAX) <= 0;
+}
+
 long int
 mpz_get_si (const mpz_t u)
 {
diff -r 805304ca965a mini-gmp/mini-gmp.h
--- a/mini-gmp/mini-gmp.h   Tue Mar 24 23:13:28 2020 +0100
+++ b/mini-gmp/mini-gmp.h   Sun Apr 19 09:47:39 2020 +0200
@@ -244,6 +244,10 @@

 int mpz_fits_slong_p (const mpz_t);
 int mpz_fits_ulong_p (const mpz_t);
+int mpz_fits_sint_p (const mpz_t);
+int mpz_fits_uint_p (const mpz_t);
+int mpz_fits_sshort_p (const mpz_t);
+int mpz_fits_ushort_p (const mpz_t);
 long int mpz_get_si (const mpz_t);
 unsigned long int mpz_get_ui (const mpz_t);
 double mpz_get_d (const mpz_t);


I attach a patch with also the tests (somehow redundant).

Ĝis,
mdiff -r 805304ca965a mini-gmp/mini-gmp.c
--- a/mini-gmp/mini-gmp.c	Tue Mar 24 23:13:28 2020 +0100
+++ b/mini-gmp/mini-gmp.c	Sun Apr 19 09:48:28 2020 +0200
@@ -1565,6 +1565,32 @@
   return us >= 0 && mpn_absfits_ulong_p (u->_mp_d, us);
 }
 
+int
+mpz_fits_sint_p (const mpz_t u)
+{
+  return (INT_MAX + INT_MIN == 0 || mpz_cmp_ui (u, INT_MAX) <= 0) &&
+mpz_cmpabs_ui (u, GMP_NEG_CAST (unsigned long int, INT_MIN)) <= 0;
+}
+
+int
+mpz_fits_uint_p (const mpz_t u)
+{
+  return u->_mp_size >= 0 && mpz_cmpabs_ui (u, UINT_MAX) <= 0;
+}
+
+int
+mpz_fits_sshort_p (const mpz_t u)
+{
+  return (SHRT_MAX + SHRT_MIN == 0 || mpz_cmp_ui (u, SHRT_MAX) <= 0) &&
+mpz_cmpabs_ui (u, GMP_NEG_CAST (unsigned long int, SHRT_MIN)) <= 0;
+}
+
+int
+mpz_fits_ushort_p (const mpz_t u)
+{
+  return u->_mp_size >= 0 && mpz_cmpabs_ui (u, USHRT_MAX) <= 0;
+}
+
 long int
 mpz_get_si (const mpz_t u)
 {
diff -r 805304ca965a mini-gmp/mini-gmp.h
--- a/mini-gmp/mini-gmp.h	Tue Mar 24 23:13:28 2020 +0100
+++ b/mini-gmp/mini-gmp.h	Sun Apr 19 09:48:28 2020 +0200
@@ -244,6 +244,10 @@
 
 int mpz_fits_slong_p (const mpz_t);
 int mpz_fits_ulong_p (const mpz_t);
+int mpz_fits_sint_p (const mpz_t);
+int mpz_fits_uint_p (const mpz_t);
+int mpz_fits_sshort_p (const mpz_t);
+int mpz_fits_ushort_p (const mpz_t);
 long int mpz_get_si (const mpz_t);
 unsigned long int mpz_get_ui (const mpz_t);
 double mpz_get_d (const mpz_t);
diff -r 805304ca965a mini-gmp/tests/t-signed.c
--- a/mini-gmp/tests/t-signed.c	Tue Mar 24 23:13:28 2020 +0100
+++ b/mini-gmp/tests/t-signed.c	Sun Apr 19 09:48:28 2020 +0200
@@ -1,6 +1,6 @@
 /* Exercise some mpz_..._si functions.
 
-Copyright 2013, 2016 Free Software Foundation, Inc.
+Copyright 2013, 2016, 2020 Free Software Foundation, Inc.
 
 This file is part of the GNU MP Library test suite.
 
@@ -197,9 +197,152 @@
 }
 
 void
+try_fits_utype_p (void)
+{
+  mpz_t x;
+  mpz_init (x);
+  if (!mpz_fits_ulong_p (x))
+{
+  printf ("mpz_fits_ulong_p (0) false!\n");
+  abort ();
+}
+  if (!mpz_fits_uint_p (x))
+{
+  printf ("mpz_fits_uint_p (0) false!\n");
+  abort ();
+}
+  if (!mpz_fits_ushort_p (x))
+{
+  printf ("mpz_fits_udhort_p (0) false!\n");
+  abort ();
+}
+  mpz_set_si (x, -1);
+  if (mpz_fits_ulong_p (x))
+{
+  printf ("mpz_fits_ulong_p (- 1) true!\n");
+  abort ();
+}
+  if (mpz_fits_uint_p (x))
+{
+  printf ("mpz_fits_uint_p (- 1) true!\n");
+  abort ();
+}
+  if (mpz_fits_ushort_p (x))
+{
+  printf ("mpz_fits_ushort_p (- 1) true!\n");
+  abort ();
+}
+  mpz_set_ui (x, ULONG_MAX);
+  if (!mpz_fits_ulong_p (x))
+{
+  printf ("mpz_fits_ulong_p (ULONG_MAX) false!\n");
+  abort ();
+}
+  mpz_add_ui (x, x, 1);
+  if (mpz_fits_ulong_p (x))
+{
+  printf ("mpz_fits_ulong_p (ULONG_MAX + 1) true!\n");
+  abort ();
+}
+  mpz_set_ui (x, UINT_MAX);
+  if (!mpz_fits_uint_p (x))
+{
+  printf ("mpz_fits_uint_p (UINT_MAX) false!\n");
+  abort ();
+}
+  mpz_add_ui (x, x, 1);
+  if (mpz_fits_uint_p (x))
+{
+  printf ("mpz_fits_uint_p (UINT_MAX + 1) true!\n");
+  abort ();
+}
+  mpz_set_ui (x, USHRT_MAX);
+  if (!mpz_fits_ushort_p (x))
+{
+  printf 

Re: possible speedup for mpz_nextprime_small

2020-04-03 Thread Marco Bodrato

Il 2020-03-27 21:36 Seth Troisi ha scritto:

note I didn't get the time to look other the patch so it might be
extra rought.


 Messaggio originale 
Oggetto: Re: Cleanup mpz_out_str in tests
Il 2020-03-26 10:10 Seth Troisi ha scritto:

"Why" today was that I had a little time today and long ago Neils had
given this idea a soft yes.


Sorry but... from my personal point of view this is a contradiction.

You have the time to write an inessential patch, but you do not have the 
time to check the code that you ask us to read?


I suggest you to invest more of your time in re-reading, improving the 
quality of, and refine your code before throwing it to this list. On my 
side, I feel I invested enough time to show you how-to improve the 
quality of your code. It's your turn.


Ĝis,
m

___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Proposed patch for mpn_trialdiv

2020-04-03 Thread Marco Bodrato

Ciao,

the current code in mpn/generic/trialdiv.c contain a comment that 
suggest to remove "one of the outer loop conditions".


The patch I attach removes it by checking only the number of primes, and 
limiting the maximum for the latter.


I also added a possible optimisation for a single limb.

A call to mpn_trialdiv should be used also in mpz/pprime_p.c , instead 
of the "Do more dividing." loop.


The original code was by Torbjorn. Any comment?

*** ../gmp-head/mpn/generic/trialdiv.c	2018-05-08 06:15:55.711524824 
+0200

--- mpn/generic/trialdiv.c  2020-04-03 23:46:03.500943337 +0200
***
*** 76,84 
  #include "trialdivtab.h"
  #undef WANT_dtab
  #undef P
-   {0,0}
  };

  static const struct gmp_primes_ptab gmp_primes_ptab[] =
  {
  #define WANT_ptab
--- 76,85 
  #include "trialdivtab.h"
  #undef WANT_dtab
  #undef P
  };

+ #define DTAB_LINES (numberof (gmp_primes_dtab))
+
  static const struct gmp_primes_ptab gmp_primes_ptab[] =
  {
  #define WANT_ptab
***
*** 88,95 

  #define PTAB_LINES (sizeof (gmp_primes_ptab) / sizeof 
(gmp_primes_ptab[0]))


- /* FIXME: We could optimize out one of the outer loop conditions if we
-had a final ptab entry with a huge np field.  */
  mp_limb_t
  mpn_trialdiv (mp_srcptr tp, mp_size_t tn, mp_size_t nprimes, int 
*where)

  {
--- 89,94 
***
*** 101,108 

ASSERT (tn >= 1);

!   for (i = *where; i < PTAB_LINES; i++)
  {
ppp = gmp_primes_ptab[i].ppp;
cps = gmp_primes_ptab[i].cps;

--- 100,136 

ASSERT (tn >= 1);

!   ASSERT (DTAB_LINES == gmp_primes_ptab[PTAB_LINES - 1].np +
!   gmp_primes_ptab[PTAB_LINES - 1].idx);
!   nprimes = MIN (nprimes, DTAB_LINES);
!
! #if 0
!   if (tn == 1)
! {
!   dp = gmp_primes_dtab;
!   r = *tp;
!   /* In this branch the value *where is interpreted differently.
!If tn decreases to 1, some primes will be checked again.
!The opposite (tn increases) should not happen.
!
!THINK: Should we avoid this? Maybe using positive and
!negative values for the two differen uses? So that we can
!detect when we switch from one another.
!   */
!   for (i = *where; i < nprimes; ++i)
!   if (r * dp[i].binv <= dp[i].lim)
! {
!   *where = i + 1;
!   return dp[i].binv;
! }
!   return 0;
! }
! #endif
!
!   nprimes -= gmp_primes_ptab[*where].idx;
!   for (i = *where; nprimes > 0; ++i)
  {
+   ASSERT (i < PTAB_LINES);
ppp = gmp_primes_ptab[i].ppp;
cps = gmp_primes_ptab[i].cps;

***
*** 113,118 
--- 141,147 

/* Check divisibility by individual primes.  */
dp = _primes_dtab[idx] + np;
+   ASSERT (idx + np <= DTAB_LINES);
for (j = -np; j < 0; j++)
{
  q = r * dp[j].binv;
***
*** 124,131 
}

nprimes -= np;
-   if (nprimes <= 0)
-   return 0;
  }
return 0;
  }
--- 153,158 


Ĝis,
mdiff -r 805304ca965a mpn/generic/trialdiv.c
--- a/mpn/generic/trialdiv.c	Tue Mar 24 23:13:28 2020 +0100
+++ b/mpn/generic/trialdiv.c	Fri Apr 03 23:49:30 2020 +0200
@@ -76,9 +76,10 @@
 #include "trialdivtab.h"
 #undef WANT_dtab
 #undef P
-  {0,0}
 };
 
+#define DTAB_LINES (numberof (gmp_primes_dtab))
+
 static const struct gmp_primes_ptab gmp_primes_ptab[] =
 {
 #define WANT_ptab
@@ -88,8 +89,6 @@
 
 #define PTAB_LINES (sizeof (gmp_primes_ptab) / sizeof (gmp_primes_ptab[0]))
 
-/* FIXME: We could optimize out one of the outer loop conditions if we
-   had a final ptab entry with a huge np field.  */
 mp_limb_t
 mpn_trialdiv (mp_srcptr tp, mp_size_t tn, mp_size_t nprimes, int *where)
 {
@@ -101,8 +100,37 @@
 
   ASSERT (tn >= 1);
 
-  for (i = *where; i < PTAB_LINES; i++)
+  ASSERT (DTAB_LINES == gmp_primes_ptab[PTAB_LINES - 1].np +
+			gmp_primes_ptab[PTAB_LINES - 1].idx);
+  nprimes = MIN (nprimes, DTAB_LINES);
+
+#if 0
+  if (tn == 1)
 {
+  dp = gmp_primes_dtab;
+  r = *tp;
+  /* In this branch the value *where is interpreted differently.
+	 If tn decreases to 1, some primes will be checked again.
+	 The opposite (tn increases) should not happen.
+
+	 THINK: Should we avoid this? Maybe using positive and
+	 negative values for the two differen uses? So that we can
+	 detect when we switch from one another.
+  */
+  for (i = *where; i < nprimes; ++i)
+	if (r * dp[i].binv <= dp[i].lim)
+	  {
+	*where = i + 1;
+	return dp[i].binv;
+	  }
+  return 0;
+}
+#endif
+
+  nprimes -= gmp_primes_ptab[*where].idx;
+  for (i = *where; nprimes > 0; ++i)
+{
+  ASSERT (i < PTAB_LINES);
   ppp = gmp_primes_ptab[i].ppp;
   cps = gmp_primes_ptab[i].cps;
 
@@ -113,6 +141,7 @@
 
   /* Check divisibility by individual primes.  */
   dp = _primes_dtab[idx] + np;
+  ASSERT (idx + np <= DTAB_LINES);
   for (j = -np; j < 0; j++)
 	{
 	  q = r * dp[j].binv;
@@ -124,8 +153,6 @@
 	}
 
  

Re: mpz_prevprime

2020-03-26 Thread Marco Bodrato

Ciao,

Il 2020-03-25 02:25 Seth Troisi ha scritto:

I'm back with a new mpz_prevprime patch


diff -r 805304ca965a doc/gmp.texi
+@deftypefun int mpz_prevprime (mpz_t @var{rop}, const mpz_t @var{op})

+If previous prime doesn't exist (e.g. @var{op} < 3), rop is unchanged 
and

+0 is returned.

I suggest "i.e." instead of "e.g.".

diff -r 805304ca965a mpz/nextprime.c
+findnext_small (unsigned t, short diff)
[...]
+  ASSERT (t > 0 || (diff < 0 && t > 2));

This is equivalent to (t > 0). Do you mean (t > 2 || (diff > 0 && t > 
0))?


+  t = diff > 0 ? ((t + 1) | (t > 1)) :
+ ((t == 3) ? 2 : ((t - 2) | 1));

Maybe move this to the caller side? Or partially, leaving here just
  ASSERT (t >= 2);
  t |= (t != 2);

-void
-mpz_nextprime (mpz_ptr p, mpz_srcptr n)
+int
+findnext (mpz_ptr p,

If the function is not public any more, use static.

-  mpz_setbit (p, 0);

This line can still be here, maybe converted to
* PTR (p) |= 1;

+  /* smaller numbers handled earlier*/
+  ASSERT (nbits >= 15);

Not needed. Maybe ASSERT (nbits > 3), because we don't want to handle 
very small numbers with this code.


+int
+mpz_prevprime (mpz_ptr p, mpz_srcptr n)
+{
+  /* First handle tiny numbers */
+  if (mpz_cmp_ui (n, 2) <= 0)
+return 0;
+
+  if (mpz_cmp_ui (n, NP_SMALL_LIMIT) < 0)
+{
+  ASSERT (NP_SMALL_LIMIT < UINT_MAX);
+  mpz_set_ui (p, findnext_small (SIZ (n) > 0 ? mpz_get_ui (n) : 1, 
-2));

+  return 2;
+}

The line "if (mpz_cmp_ui (n, 2) <= 0)" already handle the case (SIZ (n) 
<= 0), checking if (SIZ (n) > 0) is a nonsense
Anyway... assume that code is used: what happens if findnext_small(1,-2) 
is called?


diff -r 805304ca965a tests/mpz/t-nextprime.c
[...]
+refmpz_prevprime (mpz_ptr p, mpz_srcptr t)
[...]
+  if (mpz_cmp_ui(t, 3) <= 0)

The loop after that branch handles also that case, don't it?


 test_largegaps ()
 {
[...]
+  mpz_set_ui (n, 3842610773);
[...]
+  mpz_set_ui (n, 18361375334787046697UL);

Those lines can fail at compile time... if 3842610773 does not fit in an 
int and 18361375334787046697 not in an unsigned long. E.g. when 
sizeof(int)==4.

Use _set_str.

+  mpz_mul_ui (n, n, 4280516017);
[...]
+  mpz_mul_ui (n, n, 3483347771);

...


I also added large negative tests for mpz_nextprime


To be honest... I do not like this (undocumented) detail of the current 
behaviour of _nextprime. I personally do not like the idea to enforce it 
with a test...



and we can enable test_largegaps now that the default gap is smaller


Yes, we should.
I would also check many gaps around 2^{16n}, to check if everything 
works correctly when the search crosses the limb boundaries.

Maybe structuring the test with a table {"number(base16?)", gap}, i.e.
with (also) something like:
{"65521", 16},
{"4294967291", 20},
{"281474976710597", 80},
{"18446744073709551557", 72},
{"1208925819614629174706111", 78},
{"79228162514264337593543950319", 78},
{"5192296858534827628530496329220021", 100},
{"340282366920938463463374607431768211297", 210},


+void
+test_prevprime (gmp_randstate_ptr rands, int reps)
[...]
+  /* Test mpz_prevprime(3 <= n < 2^45) returns 2. */
[...]
+  /* Test mpz_prevprime(n > 2^70) returns 1. */
[...]
+if ( retval != 1 )
[...]

I do not understand. A test-function should fail if a function behaviour 
does not agree with the documented interface, or it can fail for some 
other kind of regression that we want to avoid...
Your test will warn us against possible improvements! What if the future 
primality testing functions will return 2 on some large primes?


Ok, I'm sleepy now. I hope I did not write wrong things. Good night!

Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: possible speedup for mpz_nextprime_small

2020-03-26 Thread Marco Bodrato

Ciao,

Il 2020-03-26 01:06 Seth Troisi ha scritto:

Marco was interested in what using dynamically allocating primes would
look like.
I've attached a couple of screenshots.


I'd like to understand the reason why the line "dynamic" is 5-10% lower 
than the other lines in the range above 27...



I think a good balance would be to add 70 primes and increase
SMALL_LIMIT to 1,000,000


Adding back the primes we just removed, because they are useful again!


We would need a new constant to differentiate
if (nbits / 2 < NUMBER_OF_PRIMES)
  and
ASSERT (i < NUMBER_OF_PRIMES);


Yes. The first is a threshold to switch from a method to another. Should 
it be tuned on different CPUs?

NUMBER_OF_PRIMES is the hard limit for that threshold.

Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Cleanup mpz_out_str in tests

2020-03-26 Thread Marco Bodrato

Ciao,

Il 2020-03-26 02:15 Seth Troisi ha scritto:

This cleans up a number of

printf(...)
mpz_out_str(stdout, 10/16, var);
printf(...);

and replaces them with

gmp_printf(...%Zd..., var);


Why? Is the current code not working?

In mini-gmp we have mpz_out_str, but we don't have gmp_printf.
The code with mpz_out_str is (more) easily re-usable with mini-.

Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: mpz_prevprime

2020-03-24 Thread Marco Bodrato

Ciao,

Il 2020-03-24 18:54 Seth Troisi ha scritto:

On Tue, Mar 24, 2020 at 9:56 AM Marco Bodrato
 wrote:

I propose a variation of your patch, you find it attached, on my
computer it's faster.


Couple of small notes but otherwise this looks good

+/* NP_SMALL_LIMIT = prevprime (LAST_PRIME ^ 2) */

In the light of the day this can be bumped slightly to
prevprime(nextprime(LAST_PRIME)^2) - 1 = 316960


I'm not sure.
I mean: with that list of primes we can certify primality up to that 
limit, I agree.

But what about the exit condition?
The current code consider that a number is prime if(n/prime LAST_PRIME^2, we should add one more branch in the loop.


+  /* Technically this should be prev_prime(LAST_PRIME ^ 2) */
I'd remove this, It's covered by the comment above


Right.


+  if (q < prime)
+return t;
+  if (r == 0)
+break;

Can this be rearranged to

+   if (r == 0)
+  break;
+   if (q <= prime)
+  return t;


I fear it can't. The case t=3, prime=3 would consider 3 a composite.
Moreover I believe that, on some architectures, q can be ready before r 
is, so the order Torbjorn used should reduce latency. But I may be wrong 
on that point.


The case t=3 can be healed with an if(t<9) return t; just before the 
loop. Then we should measure speed on different platforms.

With the order you propose, we may use
NP_SMALL_LIMIT = prevprime (LAST_PRIME * (LAST_PRIME + 1))
right?


+  ASSERT (i < NUMBER_OF_PRIMES);
Should this be placed at the start of the loop or be?
+  ASSERT (i+1 < NUMBER_OF_PRIMES);


The ASSERT on i make sense just before i is used by the instruction 
prime+=primegap[i], i.e. at the end of the loop. Isn't it?



If writing the code is not too complex, it may be interesting to
test if it is worth.


I'll try it out tonight


Great!


Did you try tweaking with the -p parameter? It can be useful when a
measure is "noisy".


Nope I had not, using -p3 seems to work just as well.


I would have used something like -p10 or -p100, but if -p3 is 
good for you, I'm happy :-)


Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: mpz_prevprime

2020-03-24 Thread Marco Bodrato

Il 2020-03-24 01:10 Seth Troisi ha scritto:

(I'm don't think your ulong patch is needed but I can measure at a


It's a small patch, but the speed-up it gives is probably too small to 
be worth adding.



My code doesn't use a sieve (and instead checks each number for
divisors similar to the old code) because the average gap for small
numbers is small 30 / primepi(30) = 11


There is a small error in the code, it does not handle negative numbers 
correctly.


For each divisor your code uses two operations, a product (prime*prime) 
and a remainder (t % prime). On many architectures computing t/prime and 
t%prime cost just one operation... that's why I'd use the ideas Torbjorn 
used in isprime() in dumbmp.c,  
https://gmplib.org/repo/gmp/rev/7524222ab7d4 currently in bootstrap.c .


I propose a variation of your patch, you find it attached, on my 
computer it's faster.



Because the measured threshold for this was much larger (~80 million),
it might actually make sense to use a sieve after some threshold



If you think that's a 1.5-3x speedup for inputs 16-30bits is important
I can try to dynamically generate a longer list of primes


If writing the code is not too complex, it may be interesting to test if 
it is worth.



On Mon, Mar 23, 2020 at 2:38 PM Marco Bodrato
 wrote:



It might also be useful to commit tune_nextprime_small which
dynamically selects a higher reps count for
mpz_nextprime input and helps increase precision.


Uhm...
j = 20 / s->size;



This is a great point, I modified the code so It only changes the
outer loop count.
I included a screenshot showing how much this reduces variance over
multiple runs.


Did you try tweaking with the -p parameter? It can be useful when a 
measure is "noisy".



PS: a comment in the current code says
/* Larger threshold is faster but takes ~O(n/ln(n)) memory.
To be honest, the array sieve needs n/24 bytes, and the array



You are correct, let's change the comment by which I mean you :)


Done.

Ĝis,
m

--
http://bodrato.it/papers/diff -r a7836288d2c0 mpz/nextprime.c
--- a/mpz/nextprime.c	Fri Mar 20 16:19:07 2020 +0100
+++ b/mpz/nextprime.c	Tue Mar 24 17:24:41 2020 +0100
@@ -85,6 +85,9 @@
 };
 
 #define NUMBER_OF_PRIMES 100
+#define LAST_PRIME 557
+/* NP_SMALL_LIMIT = prevprime (LAST_PRIME ^ 2) */
+#define NP_SMALL_LIMIT 310243
 
 static unsigned long
 calculate_sievelimit(mp_bitcnt_t nbits) {
@@ -120,6 +123,32 @@
   return sieve_limit;
 }
 
+static unsigned
+mpz_nextprime_small (unsigned t)
+{
+  ASSERT (t > 0); /* Expect t=1 if the operand was smaller.*/
+  /* Technically this should be prev_prime(LAST_PRIME ^ 2) */
+  ASSERT (t < NP_SMALL_LIMIT);
+
+  /* Start from next candidate (2 or odd) */
+  t = (t + 1) | (t > 1);
+  for (; ; t += 2)
+{
+  unsigned prime = 3;
+  for (int i = 0; ; prime += primegap_small[i++])
+	{
+	  unsigned q, r;
+	  q = t / prime;
+	  r = t - q * prime; /* r = t % prime; */
+	  if (q < prime)
+	return t;
+	  if (r == 0)
+	break;
+	  ASSERT (i < NUMBER_OF_PRIMES);
+	}
+}
+}
+
 void
 mpz_nextprime (mpz_ptr p, mpz_srcptr n)
 {
@@ -132,18 +161,16 @@
   unsigned odds_in_composite_sieve;
   TMP_DECL;
 
-  /* First handle tiny numbers */
-  if (mpz_cmp_ui (n, 2) < 0)
+  /* First handle small numbers */
+  if (mpz_cmp_ui (n, NP_SMALL_LIMIT) < 0)
 {
-  mpz_set_ui (p, 2);
+  ASSERT (NP_SMALL_LIMIT < UINT_MAX);
+  mpz_set_ui (p, mpz_nextprime_small (SIZ (n) > 0 ? mpz_get_ui (n) : 1));
   return;
 }
   mpz_add_ui (p, n, 1);
   mpz_setbit (p, 0);
 
-  if (mpz_cmp_ui (p, 7) <= 0)
-return;
-
   TMP_MARK;
   pn = SIZ(p);
   MPN_SIZEINBASE_2EXP(nbits, PTR(p), pn, 1);
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: mpz_prevprime

2020-03-23 Thread Marco Bodrato

Il 2020-03-23 22:38 Marco Bodrato ha scritto:


Using unsigned long to compute the modulus is obviously faster than
many calls to mpz_cdiv_ui. I tried a patch that surely is not as fast
as yours, but should speed-up all numbers that fits in a single
unsigned long. The speed-up seems small, I'm not sure it's worth.


... it is not very interesting, but I attach the patch I wrote.

Ĝis,
mdiff -r a7836288d2c0 mpz/nextprime.c
--- a/mpz/nextprime.c	Fri Mar 20 16:19:07 2020 +0100
+++ b/mpz/nextprime.c	Mon Mar 23 22:39:43 2020 +0100
@@ -202,6 +202,29 @@
 
   memset (composite, 0, odds_in_composite_sieve);
   prime = 3;
+  if (mpz_fits_ulong_p (p))
+	{
+	  unsigned long p0 = mpz_get_ui (p);
+
+	  for (i = 0; i < prime_limit; i++)
+	{
+	  m = p0 % prime;
+	  /* Only care about odd multiplies of prime. */
+	  if (m & 1)
+		m = (prime - m) >> 1;
+	  else
+		{
+		  composite[0] |= m == 0;
+		  m = prime - (m >> 1);
+		}
+
+	  /* Mark off any composites in sieve */
+	  for (; m < odds_in_composite_sieve; m += prime)
+		composite[m] = 1;
+	  prime += primegap[i];
+	}
+	}
+  else
   for (i = 0; i < prime_limit; i++)
 {
   m = mpz_cdiv_ui(p, prime);
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: mpz_prevprime

2020-03-23 Thread Marco Bodrato

Ciao,

Il 2020-03-22 10:00 Seth Troisi ha scritto:

I measure the regression as ~5-15% for input < ~10,000


And no regression for larger input? Good.


I have a proposed patch which uses int, and trial div up to some
threshold this appears to be 5x faster.


A couple of question arises.
5x faster than what? Than the patch I proposed or than the old/current 
code?


Which one is the good idea? using int or avoiding the sieve?

Using unsigned long to compute the modulus is obviously faster than many 
calls to mpz_cdiv_ui. I tried a patch that surely is not as fast as 
yours, but should speed-up all numbers that fits in a single unsigned 
long. The speed-up seems small, I'm not sure it's worth.



The cut off point as I measure it is ~80,000,000 but requires
increasing the primegap_small list to ~1100 entries

I'm not sure what you think about a list of that length.


Well, the cut off point is surely different on different architectures. 
It should be tuned.
Moreover: a large list ... can't it be generated on the fly? If we want 
a larger table, in my opinion it should be an improvement for all the 
functions that need primes (_primorial, _bin, _fac...).



It might also be useful to commit tune_nextprime_small which
dynamically selects a higher reps count for
mpz_nextprime input and helps increase precision.


Uhm...
j = 20 / s->size;

This means that
for size=10 you compute 2 primes: all primes between 2^10 and 2^18.
for size=11: all primes between 2^11 and 2^18.
for size=12... and so on up to 2^18.

This is not exactly representative of the speed of nextprime on operands 
around 2^10, 2^11... and so on.



PS: a comment in the current code says
 /* Larger threshold is faster but takes ~O(n/ln(n)) memory.
To be honest, the array sieve needs n/24 bytes, and the array 
primegap_tmp uses primepi(n) ~= n/ln(n) bytes. And asymptotically the 
sum of both is O(n).


Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: mpz_prevprime

2020-03-21 Thread Marco Bodrato

Il 2020-03-21 11:42 Seth Troisi ha scritto:

I see this was pushed, with a few more polishing tweaks.


Yes, but there are bad news, it introduces a speed regression for small 
numbers.



I see you also added more testing in tests/devel/primes.c which is a
great triple check.
It looks like the usage on line 21-28 / 399 wasn't updated.


Not yet updated, you are right.

I also added a target for tune/speed.

tune/speed -s mpz_nextprime_1.
measure the time needed to compute the first  primes larger than 
, and divides by .


If on my environment I try
$ tune/speed -s1-4000 -t2 mpz_nextprime_1.16
before, and after the patch, I see that nextprime on small numbers is 
now slower.


Before introducing new functions, may I suggest to try to speed up the 
function also for small numbers?


You measured how much it is worth sieving before using the Miller-Rabin 
(actually BPSW) test. But for really small numbers, the sieve alone can 
be enough and the expensive test can be totally skipped.


I attach a possible patch for this purpose.

Ĝis,
mdiff -r a7836288d2c0 mpz/nextprime.c
--- a/mpz/nextprime.c	Fri Mar 20 16:19:07 2020 +0100
+++ b/mpz/nextprime.c	Sat Mar 21 15:16:27 2020 +0100
@@ -85,6 +85,11 @@
 };
 
 #define NUMBER_OF_PRIMES 100
+#define LAST_PRIME 557
+/* SIEVE_ONLY_GAP = maximal gap between primes < LAST_PRIME^2,
+   We are between 155921+86=156007 and 360653+96=360749 .
+*/
+#define SIEVE_ONLY_GAP 86
 
 static unsigned long
 calculate_sievelimit(mp_bitcnt_t nbits) {
@@ -130,6 +135,7 @@
   mp_bitcnt_t nbits;
   int i, m;
   unsigned odds_in_composite_sieve;
+  int sieve_only;
   TMP_DECL;
 
   /* First handle tiny numbers */
@@ -141,10 +147,37 @@
   mpz_add_ui (p, n, 1);
   mpz_setbit (p, 0);
 
-  if (mpz_cmp_ui (p, 7) <= 0)
-return;
+  TMP_MARK;
 
-  TMP_MARK;
+  if (mpz_cmp_ui (p, LAST_PRIME*LAST_PRIME - SIEVE_ONLY_GAP) <= 0)
+{
+  unsigned long pl, cp;
+  int i;
+  primegap = primegap_small;
+
+  pl = mpz_get_ui (p);
+  cp = 3;
+  i = 0;
+  if (mpz_cmp_ui (p, LAST_PRIME) <= 0)
+	{
+	  for ( ; cp < pl; cp += primegap[i++])
+	;
+	  mpz_set_ui (p, cp);
+	  return;
+	}
+
+  odds_in_composite_sieve = SIEVE_ONLY_GAP / 2;
+  pl += SIEVE_ONLY_GAP - 2;
+  do {
+	cp += primegap[i++];
+  } while (cp*cp < pl);
+
+  prime_limit = i;
+  sieve_only = 1;
+}
+  else
+{
+/* TODO: Reindent the following code. */
   pn = SIZ(p);
   MPN_SIZEINBASE_2EXP(nbits, PTR(p), pn, 1);
   if (nbits / 2 <= NUMBER_OF_PRIMES)
@@ -192,6 +225,9 @@
 /* Corresponds to a merit 14 prime_gap, which is rare. */
 odds_in_composite_sieve = 5 * nbits;
 
+  sieve_only = 0;
+}
+
   /* composite[2*i] stores if p+2*i is a known composite */
   composite = TMP_SALLOC_TYPE (odds_in_composite_sieve, char);
 
@@ -226,7 +262,7 @@
   difference = 0;
 
   /* Miller-Rabin test */
-  if (mpz_millerrabin (p, 25))
+  if (sieve_only || mpz_millerrabin (p, 25))
 	{
 	  TMP_FREE;
 	  return;
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: mpz_prevprime

2020-03-17 Thread Marco Bodrato

Ciao,

Il 2020-03-16 02:23 Seth Troisi ha scritto:

On Sun, Mar 15, 2020 at 4:38 PM Marco Bodrato
 wrote:

May I write a few more comments?



Always, my opinion about being ready is just that it's passed
the bar of being good enough not that it's perfect.


I'm tempted to simply commit your proposal and then change it :-)
But if we keep on discussing here, we can improving the code with the 
opinions of two persons...



I see now:
+ if (nbits <= 32)
+ incr_limit = 336;



Probably, we should change the name of the variable to
odd_candidates_in_sieve, then you would probably have written:
if (nbits <= 32)
odd_numbers_in_sieve = 168;



Renamed to odds_in_composite_sieve


Of course the name is not essential, the values are :-)
And they seem correct to me, now.


Then I wonder if the cost of the /* Sieve next segment */ step

[...]

need to allocate the possibly huge next_mult array?


You're correct, this reduces the memory use 80%


Moreover the code has now the following structure:

  SIEVE;
  for (;;) {
CHECK_AND_POSSIBLY_BREAK;
SIEVE;
  }

There is an unneeded code duplication, it should be:

  for (;;) {
SIEVE;
CHECK_AND_POSSIBLY_BREAK;
  }


If you need to store larger gaps, you can also save in
the array gap>>1, because they all are even. The prime
limit will be able to go beyond 2^37...


 Updated comment for how this could be done in the future.


The first gap larger than 500, if I'm not wrong, is
304599508537+514=304599509051


I'd expect a condition like

+ if (nbits <= numberof (primegap_small) * 2 + 1)
+ {
+ primegap = primegap_small;
+ prime_limit = nbits / 2;
+ }



Done


I read:
+  if (2 * nbits <= NUMBER_OF_PRIMES)
+{
+  primegap = primegap_small;
+  prime_limit = nbits / 2;
+}

But ... this means that at most one fourth of the constants in 
primegap_small are used, am I wrong?


Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Moving LOOP_ON_SIEVE_* macros to gmp-impl.h

2020-03-17 Thread Marco Bodrato

Ciao,

Il 2020-03-16 04:36 Seth Troisi ha scritto:

Per Marco's comments in my prev_prime/next_prime patch
I moved some of the primesieve macros and functions to gmp-impl.h



There are two reasons why I never moved those functions and macros to 
gmp-impl.h, two aspects of one problem: the interface is not clean.


First: I'm not sure I like macros that open a loop and don't close it...

Second: n_to_bit() is not injective, obviously.
E.g. n_to_bit(7) = 1, n_to_bit(10) = 1 .
This is not a problem when its value is used for n_to_bit (), but 
generates confusion if n_to_bit() is used on a  value that 
is not carefully chosen...



The first, is maybe just an opinion of mine, do you think those macros 
are reasonably clean?


The second, should be healed somehow. The easier way probably is to 
write two different functions:

static mp_limb_t
n_to_bit_end (mp_limb_t n) { return ((n-5)|1)/3U; }

static mp_limb_t
n_to_bit_start (mp_limb_t n) { return (n-2-n%2)/3U; }


bit_to_n (renamed sieve_bit_to_n)
id_to_n (renamed sieve_id_to_n)
n_to_bit (renamed sieve_n_to_bit)


Renaming is a good idea, IMO.


This allows the four (soon to be five) identical copies in
bin_uiui, oddfac_1, primorial_ui, and tests/devel/primes
to be cleaned up.


Uhm I have (slightly) changed the macros. Not the interface.


It's not clear where this should be documented, if someone tells


Niels' answer is perfect.

Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: mpz_prevprime

2020-03-15 Thread Marco Bodrato

Ciao,

Il 2020-03-15 00:07 Seth Troisi ha scritto:

New patch which is cleaned up and IMO ready to commit


May I write a few more comments?


On Mon, Mar 9, 2020 at 5:15 PM Seth Troisi  wrote:



I also dislike the boiler plate of the macros but I didn't


Those macros should probably be moved to gmp-impl.h, to we avoid 
duplication.
For this to be possible, we should avoid having different versions of 
the same macros in the different files...



When nbits is small (IMHO the most common case) we'd like to be able
to fallback to a constant array which I don't think exist for
primesieve (because gmp_limb_bits isn't constant).


Only the first "limb" of the sieve is hard-coded in primesieve.c, but it 
is possible to generate a larger "seed" at configure time, if we want.



Why not using a variable for INCR_LIMIT?


I see now:
+  if (nbits <= 32)
+incr_limit = 336;
+  else if (nbits <= 64)
+incr_limit = 1550;
+  else
+/* Corresponds to a merit 10 prime_gap, which is rare. */
+incr_limit = 7 * nbits / 2;

Probably, we should change the name of the variable to 
odd_candidates_in_sieve, then you would probably have written:

 if (nbits <= 32)
   odd_numbers_in_sieve = 168;
 else if (nbits <= 64)
   odd_numbers_in_sieve = 775;
 else
  ...

Then I wonder if the cost of the /* Sieve next segment */ step is so 
high that we really have to try hard to avoid it. And if we are able to 
reduce the probability of its use to very unlikely, do we then really 
need to allocate the possibly huge next_mult array?


+  mpz_root (tmp, tmp, 2);

I'd use mpz_sqrt.

+  // TODO: break the rare gap larger than this into gaps <= 255

If you need to store larger gaps, you can also save in the array gap>>1, 
because they all are even. The prime limit will be able to go beyond 
2^37...


+  /* Avoid having a primegap > 255, first occurence 436,273,009. */
+  ASSERT( 4000 < sieve_limit && sieve_limit < 43600 );

Why do you need to check (4000 < sieve_limit)?

I'd expect a condition like

+  if (nbits <= numberof (primegap_small) * 2 + 1)
+{
+  primegap = primegap_small;
+  prime_limit = nbits / 2;
+}

so that all primegap_small can be used. If you think this threshold is 
too high, then primegap_small can be reduced...


Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: [PATCH] mini-gmp: pass correct old_size to custom reallocate function

2020-03-09 Thread Marco Bodrato
Ciao,

Il Lun, 9 Marzo 2020 12:30 pm, Niels Möller ha scritto:
> Marco Bodrato  writes:
>> For the style, I do not know which one is better:
>> "if(c){v=val}else{v=0};" or "v=0;if(c){v=val};"

> I don't have a very strong opinion on this point, but when it's easy to
> do, I prefer to assign once on each code path.

Agreed.

Ĝis,
m

___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: [PATCH] mini-gmp: pass correct old_size to custom reallocate function

2020-03-09 Thread Marco Bodrato
Ciao,

Il Lun, 9 Marzo 2020 2:08 pm, Niels Möller ha scritto:
> Marco Bodrato  writes:
>
>> -gmp_free_func (rden, 0);
>> +gmp_free_func (rden, lden * sizeof (char));
>
> sizeof (char) == 1 by definition, so no need to multiply with it.

Of course you are right!

Ĝis,
m

___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: mpz_prevprime

2020-03-09 Thread Marco Bodrato

Ciao,

Il 2020-03-09 11:59 Seth Troisi ha scritto:

On Mon, Mar 9, 2020 at 2:05 AM Marco Bodrato
 wrote:


Ciao,

Il 2020-03-09 02:56 Seth Troisi ha scritto:



It's highly possible I misunderstand how these macros are supposed to
be used.
I also dislike the boiler plate of the macros but I didn't like


When I say "that code is messy", I mean my code.
And you agree :-) those macros are just easy to misunderstand :-/


If you have suggestions or code pointers for a better pattern I'd
appreciate that.



Did you try to use the gmp_primesieve function directly?


I'm not sure what you're referencing here, I tried with


I'm proposing to get rid of primegap, and use something like the 
following.

This is just a copy-paste from your code, not a really working sequence!

+  __sieve = TMP_ALLOC_LIMBS (primesieve_size (sieve_limit));
+  prime_limit_pre = gmp_primesieve(__sieve, sieve_limit);

+  next_mult = TMP_ALLOC_TYPE (prime_limit, unsigned int);
[...]
+  /* Invert p for modulo */
+  mpz_neg(p, p);

[..handle 3 outside the loop]

+LOOP_ON_SIEVE_BEGIN (prime, n_to_bit (5), n_to_bit (sieve_limit), 
0, __sieve);

+  m = mpz_fdiv_ui(p, prime);
+  /* Only care about odd multiplies (even indexes) */
+  if (m & 1)
+m += prime;
+
+  /* mark off any composites in sieve */
+  for (; m < INCR_LIMIT; m += 2*prime)
+sieve[m] = 1;
+  next_mult[i] = m;
+LOOP_ON_SIEVE_END;
+  mpz_neg(p, p);

[...]

+  /* Sieve next segment */
[..handle 3]
+  memset(sieve, 0, INCR_LIMIT);
+  LOOP_ON_SIEVE_BEGIN (prime, n_to_bit (5), n_to_bit (sieve_limit), 
0, __sieve);

+  m = next_mult[i] - INCR_LIMIT;
+  for (; m < INCR_LIMIT; m += prime * 2)
+sieve[m] = 1;
+  next_mult[i] = m;
+  LOOP_ON_SIEVE_END;


Is there a way to unalloc a TMP_ALLOC_LIMBS before TMP_FREE?


No, if you want to unalloc, you must use __GMP_ALLOCATE_FUNC_TYPE, 
__GMP_REALLOCATE_FUNC_TYPE, and __GMP_FREE_FUNC_TYPE.



Sieve next segment is rare, the average gap is ln(n), if INCR_LIMIT
was changed to be a variable it could be set
so /*sieve next segment*/ happened on average less than once, and then


Why not using a variable for INCR_LIMIT?


In the array char *sieve, you use only the even entries. Maybe you
can reduce its size by half, and remove some "*2" around the code?


it's only 4000 entries so I wasn't bothering at this time, but it


I agree.

PS: did you consider mpz_Cdiv_ui, instead of _neg,_Fdiv_ui,_neg ?

Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: [PATCH] mini-gmp: pass correct old_size to custom reallocate function

2020-03-09 Thread Marco Bodrato

Ciao,

Il 2020-03-07 21:27 minux ha scritto:

All comments addressed, except as noted below.
I also fixed similar issues in mini-mpq.c changes.

On Sat, Mar 7, 2020 at 2:26 PM Niels Möller  
wrote:

minux  writes:


I'm not that familiar with the mpq functions. I hope Marco can 
comment.


I had problems with my e-mail adress, but I'm here again :-)

Maybe there are some unneeded initialisations = NULL, but details can 
also be refined later...


There is an undocumented "feature" that the proposed patch breaks: 
"safe" failure when the base is out of range.
The question is, should we keep it or not, document it or not, should 
GMP and mini-gmp agree?


I attach a possible patch for tests that check that feature.


> @@ -4199,7 +4208,7 @@



> -  size_t i, sn;
> +  size_t i, sn, osn = 0;



> @@ -4220,15 +4229,15 @@



>if (!sp)
> -sp = (char *) gmp_xalloc (1 + sn);
> +sp = (char *) gmp_xalloc (osn = 1 + sn);

I'd prefer

  if (!sp)
{
  osn = 1 + sn;
  sp = (char *) gmp_xalloc (osn);
}
  else
osn = 0;

(and drop the zero initialization at the declaration above).


There are many other places where we initialize a variable with a 
"default" and we conditionally change the value... I agree with the 
separate line for "osn = 1 + sn".


For the style, I do not know which one is better:
"if(c){v=val}else{v=0};" or "v=0;if(c){v=val};"

Ĝis,
mdiff -r c44538397385 mini-gmp/tests/t-mpq_str.c
--- a/mini-gmp/tests/t-mpq_str.c	Wed Feb 12 20:48:35 2020 +0100
+++ b/mini-gmp/tests/t-mpq_str.c	Mon Mar 09 10:07:10 2020 +0100
@@ -1,6 +1,6 @@
 /*
 
-Copyright 2012-2014, 2016, 2018 Free Software Foundation, Inc.
+Copyright 2012-2014, 2016, 2018, 2020 Free Software Foundation, Inc.
 
 This file is part of the GNU MP Library test suite.
 
@@ -144,7 +144,7 @@
   char *ap;
   char *bp;
   char *rp;
-  size_t rn, arn;
+  size_t rn;
 
   mpq_t a, b;
 
@@ -173,7 +173,6 @@
 	}
 
 	  rn = strlen (rp);
-	  arn = rn - (rp[0] == '-');
 
 	  bp = mpq_get_str (NULL, (i&1 || base > 36) ? base: -base, a);
 	  if (strcmp (bp, rp))
@@ -246,6 +245,16 @@
 	  free (rp);
 	  testfree (bp);
 	}
+
+  /* Check behaviour (undocuented, but conforming with GMP) when
+	 base is out of range. */
+  base = (i&1) ? 63: -37;
+  if (mpq_get_str (bp, base, a))
+	{
+	  fprintf (stderr, "mpz_get_str did not fail, as expected:\n");
+	  fprintf (stderr, "  base = %d\n", base);
+	  abort ();
+	}
 }
   mpq_clear (a);
   mpq_clear (b);
diff -r c44538397385 tests/mpz/convert.c
--- a/tests/mpz/convert.c	Wed Feb 12 20:48:35 2020 +0100
+++ b/tests/mpz/convert.c	Mon Mar 09 10:07:10 2020 +0100
@@ -103,11 +103,10 @@
 
   mpz_urandomb (bs, rands, 32);
   bsi = mpz_get_ui (bs);
-  base = bsi % 62 + 1;
-  if (base == 1)
-	base = 0;
+  base = bsi % 62;
+  base += (base != 0);
 
-  str = mpz_get_str ((char *) 0, base, op1);
+  str = mpz_get_str ((char *) 0, (i&1 || base > 36) ? base: -base, op1);
   mpz_set_str_or_abort (op2, str, base);
 
   if (mpz_cmp (op1, op2))
@@ -122,6 +121,15 @@
 
   (*__gmp_free_func) (str, strlen (str) + 1);
 
+  /* Check behaviour (undocuented, but safe) when
+	 base is out of range. */
+  if (mpz_get_str (str, (i&1) ? 63: -37, op1))
+	{
+	  fprintf (stderr, "ERROR, mpz_get_str did not return NULL for base out of range, in test %d\n", i);
+	  fprintf (stderr, "base = %d\n", (i&1) ? 63: -37);
+	  abort ();
+	}
+
   /* 2. Generate random string and convert to mpz_t and back to a string
 	 again.  */
   mpz_urandomb (bs, rands, 32);
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: mpz_prevprime

2020-03-09 Thread Marco Bodrato

Ciao,

Il 2020-03-09 02:56 Seth Troisi ha scritto:

From Marco Bodrato's analysis I got inspired and modified nextprime to


Your analysis is much deeper than mine :-)
I do not have much time now... but I glanced at the code and write here 
a few comments.



You can see some of my thresholds and testing input in this google
sheet[1]. The end result is sieve up ~ LOG2(N) ^ (20/7) / 3268


Funny numbers :-)

In the code, you compute LOG2(N) ^ (20/7) / 3268. (A comment says "/ 
1880")

Then, if the result is greater than 43000, you cut it.
Cutting seems a good idea, but I'd do the reverse:
 if nbits is larger than (43000*3268) ^ (7/20) = 17940, set limit to 
43000,

 compute otherwise.


For 10,000 bit numbers this sieves up to 80 million (which takes ~4
seconds but reduces nextprime from ~300seconds to ~200 seconds)


Seems interesting :-)


I cleaned up the code without any of the debug printing in the 3rd
patch nextprime_sieve_clean.patch which I'd love people to consider


Some random comments:

If you use TMP_DECL, then you don't need to use also TMP_SDECL.
The same for TMP_MARK and TMP_SMARK; or TMP_FREE and TMP_SFREE.


I'm happy to see that you used gmp_primesieve and the macros for it, 
but...
... maybe that code is too messy? I mean, my code with the 
LOOP_ON_SIEVE_ macros...
You did not use that sieve to directly loop on primes, but you used it 
to loop on primes for building a primegap table, and then use that table 
to loop on primes...


Did you try to use the gmp_primesieve function directly?
If sieve_limit = 43000, you are allocating 17M for mp_limb_t *sieve, 
then another 22M for primegap_tmp, and both are kept in memory.
I believe that looping with primegap is faster, but... how many times do 
we need too loop, on average? I mean, how often is the /*Sieve next 
segment*/ code used?



In the array char *sieve, you use only the even entries. Maybe you can 
reduce its size by half, and remove some "*2" around the code?


Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Overflow in mpz_cmp

2020-02-18 Thread Marco Bodrato

Ciao,
Il 2020-02-18 00:35 Guillaume Melquiond ha scritto:

Le 17/02/2020 à 22:59, Marco Bodrato a écrit :


Did you apply formal verification to other functions? Did they 
succeed?



Here is the list of verified functions:



mpn_powm



mpn_sqrtrem


Those are particularly interesting for me!

I recently committed some special flavours of powm: special cases for 
base=2 or when the base is a single limb. It would be really interesting 
to check if they are formally correct.


Moreover year ago, or maybe more, I wrote a possible variation for 
sqrt1. I never moved it to the main repository, because I did not have 
the time to analyse its correctness. Again, an instrument to check, 
would be nice.



A bit more details can be found there:

https://gitlab.inria.fr/why3/whymp


I'll look into your work, thanks.

Ĝis,
m

--
http://bodrato.it/papers/
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Overflow in mpz_cmp

2020-02-17 Thread Marco Bodrato

Ciao,
Il 2020-02-13 08:54 Guillaume Melquiond ha scritto:

Note that mpz_cmpabs does not have any overflow issue, since absolute
sizes are subtracted there. So, except maybe for homogeneity with
mpz_cmp, it can keep using the optimized version.


I pushed a patch for both cmp and cmpabs. So that the second is ready 
for a change in the type of the field _mp_size for the mpz_t variables 
:-)


But you are right, the patch to cmpabs is not needed for the current 
code.


Il 2020-02-10 18:25 Guillaume Melquiond ha scritto:

The bug was noticed when formally verifying this function.


Did you apply formal verification to other functions? Did they succeed?

Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: mpz_prevprime

2020-02-12 Thread Marco Bodrato
Ciao,

Il Mer, 12 Febbraio 2020 7:26 am, Niels Möller ha scritto:
> Marco Bodrato  writes:

>> Maybe my estimates are wrong. If they are not, the limit for trial
>> division should be increased more than linearly on the size of the

> And current code uses
>   prime_limit = nbits / 2;

> Why the constant "/ 2"? No idea, this code seems to be written by
> Torbjörn's and me a decade ago.

Before the rewrite, the code was

void
mpz_nextprime (mpz_ptr p, mpz_srcptr t)
{
  mpz_add_ui (p, t, 1L);
  while (! mpz_probab_prime_p (p, 5))
mpz_add_ui (p, p, 1L);
}

Which had the only benefit (from my personal point of view) of returning
the next (possibly negative) probab_prime when a negative value was given
as an input :-)

The "/2" is one of the many constant that should be interesting to tune,
but it's not easy to understand exactly how to, because we do not even
really know if we need a multiplicative constant or some other curve.

>> A possible error in my analysis: I assumed a "constant cost" for each
>> new prime. That's true when updating the residues, but each new number
>> in the initial loop on _tdiv_ui imply a cost that is linear on the
>> bit-size of the number x.

> And for a small prime gap (common case), this cost should be the main
> part of the sieving. So it would make sense to optimize it. If we also
> use the ptab, we could compute modulo single-limb prime products using
> precomputed constants.

Yes, of course. And the next question will be: should we generate on the
fly an even larger table of primes when the required size grows beyond the
precomputed table?

Ĝis,
m

___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: mpz_prevprime

2020-02-11 Thread Marco Bodrato

Ciao,

Il 2020-02-10 21:01 ni...@lysator.liu.se ha scritto:
On my laptop, it gives a speedup of about 25% for larger sizes. Not 
sure

how to tune for small sizes, but clearly the old code clamping the size
of the prime table based on the bit size is better than doing nothing.

The computation of all the moduli could be speed up by using ptab, and
by using precomputed reciprocals. Not clear to me how much the speed of
that part matters. I haven't run any profiling.


How often numbers pass the sieving if we use trial division up to 
primelimit?


Approximately 1/(ln(primelimit)*e^g), where g is the Eulero-Mascheroni 
constant. Right?


Assume, we had primelimit = e^n and we increase it to e^(n+1).

The number of primes approximately was e^n/n, now it is e^(n+1)/(n+1).

1/(n*e^g) passed trial division, now only 1/((n+1)*e^g) pass it.

For the numbers that did not pass trial division, nothing changes.

For the numbers that did pass trial division, we add e^(n+1)/(n+1)-e^n/n 
tests = e^n*(e-1-1/n)/(n+1). Can we assume a constant cost for each one 
of this tests?
But for one every (n+1) of them, we can avoid any further test (they are 
detected by the new trial divisions). Can we assume that the cost for 
the additional test basically is the cost of Miller-Rabin? Let's call 
this cost MR(x).


So, increasing primelimit from e^n to e^(n+1), we pay 
e^n*(e-1-1/n)/(n+1) and we gain MR(x)/(n+1). It is worth doing if 
primelimit (e^n) is comparable with the cost MR(x). But the cost of 
Miller-Rabin is more than quadratic on the bit-size of the number x, 
right?


Maybe my estimates are wrong. If they are not, the limit for trial 
division should be increased more than linearly on the size of the 
numbers that are tested.



A possible error in my analysis: I assumed a "constant cost" for each 
new prime. That's true when updating the residues, but each new number 
in the initial loop on _tdiv_ui imply a cost that is linear on the 
bit-size of the number x.
This observation gives again an almost linear relation between 
primelimit and the bit-size of the number... and says that the speed of 
the "update residues" loop is not so relevant, at least for large 
numbers; but the speed of the initial moduli probably IS relevant.


Ĝis,
m

--
http://bodrato.it/papers/
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Overflow in mpz_cmp

2020-02-11 Thread Marco Bodrato

Ciao,

Il 2020-02-11 14:56 Marc Glisse ha scritto:

On Tue, 11 Feb 2020, Niels Möller wrote:

 if (usize != vsize)
   return (usize > vsize) ? 1 : -1;



On x86_64, both gcc and clang optimize (usize > vsize) ? 1 : -1 to 2 *
(usize > vsize) - 1 (as a single LEA for gcc, 2 ADD for llvm). So the
generated code may be just as good with the simple code.


We know, optimising is a complex task, and we are not writing a compiler 
here. But it is funnier to observe how the compilers translate the last 
line of mpz/cmp.c:

  return (usize >= 0 ? cmp : -cmp);
in the branches where the compiler "knows" that cmp is zero :-)

Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Overflow in mpz_cmp

2020-02-11 Thread Marco Bodrato

Ciao,

Il 2020-02-11 14:42 ni...@lysator.liu.se ha scritto:

Marco Bodrato  writes:



diff -r f5601c2a8b11 mpz/cmp.c
--- a/mpz/cmp.c Sun Feb 09 16:16:19 2020 +0100
+++ b/mpz/cmp.c Tue Feb 11 14:20:39 2020 +0100
@@ -35,15 +35,15 @@



+  cmp = (usize > vsize) - (usize < vsize);
+  if (cmp != 0)
+return cmp;


I would be tempted to keep it simple,

  if (usize != vsize)
return (usize > vsize) ? 1 : -1;



It's not clear to me if this is worth micro optimizing, and ensure we
get only a single branch.


I believe that current compilers on modern architectures should compile 
the (usize > vsize)?1:-1 expression with no branches.


I tested gcc on amd64 and on arm64, and on both archs your code is 
compiled with the single (usize != vsize) branch.


Your proposal is simpler to read too.

Ĝis,
m

--
http://bodrato.it/papers/
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Overflow in mpz_cmp

2020-02-11 Thread Marco Bodrato

Ciao,

Il 2020-02-10 18:25 Guillaume Melquiond ha scritto:

When the operand sizes do not match, the mpz_cmp function function just
returns the difference of the signed sizes. Unfortunately, this
difference might not fit inside the "int" return type, when the numbers
are of opposite sign.


In mini-gmp we defined a macro:
#define GMP_CMP(a,b) (((a) > (b)) - ((a) < (b)))

We may use the same idea here too. I mean something like the following:

diff -r f5601c2a8b11 mpz/cmp.c
--- a/mpz/cmp.c Sun Feb 09 16:16:19 2020 +0100
+++ b/mpz/cmp.c Tue Feb 11 14:20:39 2020 +0100
@@ -35,15 +35,15 @@
 int
 mpz_cmp (mpz_srcptr u, mpz_srcptr v) __GMP_NOTHROW
 {
-  mp_size_t  usize, vsize, dsize, asize;
+  mp_size_t  usize, vsize, asize;
   mp_srcptr  up, vp;
   intcmp;

   usize = SIZ(u);
   vsize = SIZ(v);
-  dsize = usize - vsize;
-  if (dsize != 0)
-return dsize;
+  cmp = (usize > vsize) - (usize < vsize);
+  if (cmp != 0)
+return cmp;

   asize = ABS (usize);
   up = PTR(u);
diff -r f5601c2a8b11 mpz/cmpabs.c
--- a/mpz/cmpabs.c  Sun Feb 09 16:16:19 2020 +0100
+++ b/mpz/cmpabs.c  Tue Feb 11 14:20:39 2020 +0100
@@ -36,15 +36,15 @@
 int
 mpz_cmpabs (mpz_srcptr u, mpz_srcptr v) __GMP_NOTHROW
 {
-  mp_size_t  usize, vsize, dsize;
+  mp_size_t  usize, vsize;
   mp_srcptr  up, vp;
   intcmp;

   usize = ABSIZ (u);
   vsize = ABSIZ (v);
-  dsize = usize - vsize;
-  if (dsize != 0)
-return dsize;
+  cmp = (usize > vsize) - (usize < vsize);
+  if (cmp != 0)
+return cmp;

   up = PTR(u);
   vp = PTR(v);


Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: mpz_prevprime

2020-02-09 Thread Marco Bodrato

Ciao,

Il 2020-01-30 21:12 Seth Troisi ha scritto:

I've disabled both gap test but left the code is so that I can
uncomment and run it when doing serious changes to the file.


If the gap tests are basically equivalent to running many repetitions of 
test_ref, you can trigger it when the number of repetitions is larger 
than some adequately large ammount.


By the way I  pushed this version of your patch.

https://gmplib.org/repo/gmp/rev/3507f80aee0a


I was referencing my change for mpz_prevprime which is implemented via
a macro shared with mpz_nextprime


In my opinion, the loop in _{prev,next}prime is not really 
time-critical.
So that... I'd like a different approach, not a macro. Moreover, I'm not 
the main developer of the function mpz_nextprime... I'll personally 
leave to the other developers the task of revising the patch and 
possibly integrate it with the current library.


Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: mpz_prevprime

2020-02-05 Thread Marco Bodrato

Ciao,

Il 2020-02-05 00:59 Seth Troisi ha scritto:

I got VERY VERY VERY lucky and found a prime gap > 2^15 with the 4th
highest merit of ANY KNOW PRIME GAP.


Great!


Given Marco's timing of 25 seconds (and a goal of say 3 seconds on
that machine) the start prime would need to be ~~200 digits.



t...@gmplib.org (Torbjörn Granlund) writes:

We have tried to stick to a limit of 1 s on a reasonable current
computer. Most tests should use much less, if possible.


I underline one word TG used: reasonable. Mine was not :-)

I was only measuring how much the patch increases the time used by the 
test.


But I'm personally still not really understanding which feature is 
tested by the patched code and not tested by the current code.


Ĝis,
mCiao,

Il 2020-02-05 00:59 Seth Troisi ha scritto:

I got VERY VERY VERY lucky and found a prime gap > 2^15 with the 4th
highest merit of ANY KNOW PRIME GAP.


Great!


Given Marco's timing of 25 seconds (and a goal of say 3 seconds on
that machine) the start prime would need to be ~~200 digits.



t...@gmplib.org (Torbjörn Granlund) writes:

We have tried to stick to a limit of 1 s on a reasonable current
computer. Most tests should use much less, if possible.


I underline one word TG used: reasonable. Mine was not :-)

I was only measuring how much the patch increases the time used by the 
test.


But I'm personally still not really understanding which feature is 
tested by the patched code and not tested by the current code.


Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


  1   2   3   4   >