Re: hgcd1/2

Niels Möller Sun, 15 Sep 2019 06:04:12 -0700

t...@gmplib.org (Torbjörn Granlund) writes:

> Looking at when method 1 or 3 is faster than 2 is more interesting.
> Method 1 and to some extent also method 3 would benefit from asm code,


How would method 1 benefit from asm code? To use instructions providing
both quotient and remainder? For x86_64, it looks like gcc uses

  xor %edx, %edx
  div %rcx

to get quotient and remainder (i.e., using the 128/64 division
instruction; there seems there is no 64/64 divison instruction?).

Method 2 and 3 begs for using subtract + cmov, not sure if the compiler
is clever enough to do that for the code using masks.

> Would your present tuneup/speed setup allow measuring of asm code?

I think so, ./speed mpn_hgcd2 will measure the mpn_hgcd2 function in the
library, with no extra hacks. Will maybe need some tricks to be able to
select other methods and ignore the asm version, but I'd expect that to be
fairly easy.

> The current div1 measurements include hgcd2's own time, right?  I.e., if
> we found a div1 which runs in zero cycles, the timings would not be
> zero.

Yes. Do you think it makes sense to measure div1 in isolation? Since
performance depends so much on the distribution of the inputs, measuring
hgcd2 makes sense to me.

I'm appending another iteration of the patch to add div2 function based
on div1 on the high limbs. Selected via HGCD2_DIV2_METHOD. Benchmarks:

HGCD2_DIV2_METHOD mpn_hgcd2_1   mpn_hgcd2_2   mpn_hgcd2_3
                1    #1504.47       1740.53       1513.56  div1 + unlikely case
                2    #1739.66       1858.01       1753.73  current code
                3    #1615.32       1736.69       1621.95  the if:ed out version

Regards,
/Niels

*** /tmp/extdiff.C9Ot80/gmp.217337dadd8e/mpn/generic/hgcd2.c    2019-09-15 
14:37:42.395279848 +0200
--- /home/nisse/hack/gmp/mpn/generic/hgcd2.c    2019-09-15 14:32:06.745439876 
+0200
***************
*** 41,44 ****
--- 41,48 ----
  #endif
  
+ #ifndef HGCD2_DIV2_METHOD
+ #define HGCD2_DIV2_METHOD 3
+ #endif
+ 
  #if GMP_NAIL_BITS != 0
  #error Nails not implemented
***************
*** 139,142 ****
--- 143,201 ----
  #endif
  
+ #if HGCD2_DIV2_METHOD == 1
+ static mp_limb_t
+ div2 (mp_ptr rp,
+       mp_limb_t n1, mp_limb_t n0,
+       mp_limb_t d1, mp_limb_t d0)
+ {
+   mp_double_limb_t rq = div1 (n1, d1);
+   if (UNLIKELY (rq.d1 > d1))
+     {
+       mp_limb_t n2, q, t1, t0;
+       int c;
+ 
+       /* Normalize */
+       count_leading_zeros (c, d1);
+       ASSERT (c > 0);
+ 
+       n2 = n1 >> (GMP_LIMB_BITS - c);
+       n1 = (n1 << c) | (n0 >> (GMP_LIMB_BITS - c));
+       n0 <<= c;
+       d1 = (d1 << c) | (d0 >> (GMP_LIMB_BITS - c));
+       d0 <<= c;
+ 
+       udiv_qrnnd (q, n1, n2, n1, d1);
+       umul_ppmm (t1, t0, q, d0);
+       if (t1 > n1 || (t1 == n1 && t0 > n0))
+       {
+         ASSERT_ALWAYS (q > 0);
+         q--;
+         n1 += d1;
+         sub_ddmmss (t1, t0, t1, t0, 0, d0);
+       }
+       sub_ddmmss (n1, n0, n1, n0, t1, t0);
+ 
+       /* Undo normalization */
+       rp[0] = (n0 >> c) | (n1 << (GMP_LIMB_BITS - c));
+       rp[1] = n1 >> c;
+ 
+       return q;
+     }
+   else
+     {
+       mp_limb_t q, t1, t0;
+       n1 = rq.d0;
+       q = rq.d1;
+       umul_ppmm (t1, t0, q, d0);
+       if (UNLIKELY (t1 >= n1) && (t1 > n1 || t0 > n0))
+       {
+         q--;
+         sub_ddmmss (t1, t0, t1, t0, d1, d0);
+       }
+       sub_ddmmss (rp[1], rp[0], n1, n0, t1, t0);
+       return q;
+     }
+ }
+ #elif HGCD2_DIV2_METHOD == 2
  /* Two-limb division optimized for small quotients.  */
  static inline mp_limb_t
***************
*** 198,202 ****
  }
  
! #if 0
  /* This div2 uses less branches, but relies on fast count_leading_zeros. */
  static inline mp_limb_t
--- 257,261 ----
  }
  
! #elif HGCD2_DIV2_METHOD == 3
  /* This div2 uses less branches, but relies on fast count_leading_zeros. */
  static inline mp_limb_t
***************
*** 239,242 ****
--- 298,303 ----
    return q;
  }
+ #else
+ #error Unknown HGCD2_DIV2_METHOD
  #endif
  

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
_______________________________________________
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Re: hgcd1/2

Reply via email to