Re: GMP 6.1.2 t-count_zeros failure on ARM with assertions

2018-01-02 Thread Torbjörn Granlund
(Related, I wonder what the effect would be of redefining umul_ppmm as C
expressions involving __uint128_t on compilers that support that).

  We do that already for some CPUs, but this has proven to be somewhat
  fragile, and in unexpected cases lead to libgcc calls.

We brave to do that for at least PowerPC-64, MIPS-64, s390x, Arm64.  For
alpha, gcc provides an _int_mult_upper which we use instead.

Apart from better scheduling, making gcc aware of the semantics allows
for algebraic optimisations and various foldings.

-- 
Torbjörn
Please encrypt, key id 0xC8601622
___
gmp-bugs mailing list
gmp-bugs@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-bugs


Re: GMP 6.1.2 t-count_zeros failure on ARM with assertions

2018-01-02 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes:

  Using inline asm instead has the drawback that it leaves a little less
  opportunity for the compiler to schedule this instructions optimally. No
  idea if that matters in practice. Since it seems we don't really need
  count_*_zeros to support zero input, is there any advantage in using
  inline asm?
  
Sure, and that matters chiefly if the instructions have a long latency.
(Now, it is quite likely that CLZ and RBIT didn't get described to the
compiler scheduler, as they are usually not used.)

  (Related, I wonder what the effect would be of redefining umul_ppmm as C
  expressions involving __uint128_t on compilers that support that).
  
We do that already for some CPUs, but this has proven to be somewhat
fragile, and in unexpected cases lead to libgcc calls.

-- 
Torbjörn
Please encrypt, key id 0xC8601622
___
gmp-bugs mailing list
gmp-bugs@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-bugs


Re: GMP 6.1.2 t-count_zeros failure on ARM with assertions

2018-01-02 Thread Niels Möller
t...@gmplib.org (Torbjörn Granlund) writes:

> We might define these directly, at least for arm64, to CLZ and RBIT+CLZ,
> respectively, instead of using gcc's builtin semi-defined variants?

Using inline asm instead has the drawback that it leaves a little less
opportunity for the compiler to schedule this instructions optimally. No
idea if that matters in practice. Since it seems we don't really need
count_*_zeros to support zero input, is there any advantage in using
inline asm?

(Related, I wonder what the effect would be of redefining umul_ppmm as C
expressions involving __uint128_t on compilers that support that).

/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
___
gmp-bugs mailing list
gmp-bugs@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-bugs