Although GMP is designed for very long numbers, sometimes systems using it would have task divide amd mod very short numbers : one limb by one limb. For example if we factor this number. My suggestion (is has correct values?) is:
void mpn_tdiv_qr (mp_ptr qp, mp_ptr rp, mp_size_t qxn, mp_srcptr np, mp_size_t nn, mp_srcptr dp, mp_size_t dn) .......................... ......................... case 1: { if (nn = 1) <---------------- { qp[0] = *np / *dp; rp[0] = *np % *dp; } else rp[0] = mpn_divrem_1 (qp, (mp_size_t) 0, np, nn, dp[0]); return; } is called twice machine divrem but is I add assignment to limbs, it is speedup, because doesn't careof memory changes. Better than using div structure from <stdlib.h> because this structure is signed. Finally: if (nn = 1) { mp_limb_t n = *np; mp_limb_t d = *dp; qp[0] = n / d; rp[0] = n % d; } VisualC++ give mp_limb_t n = *np; mp_limb_t d = *dp; qp[0] = n / d; 00007FF75A0727B0 mov rax,qword ptr [r9] 00007FF75A0727B3 mov r10,rdx 00007FF75A0727B6 mov r8,qword ptr [dp] 00007FF75A0727BB xor edx,edx 00007FF75A0727BD div rax,qword ptr [r8] 00007FF75A0727C0 mov qword ptr [rcx],rax rp[0] = n % d; It is correct? These values must return mpn_tdiv_qr? It is possible also sppedup larger (more complicated) 2limbs/1limb ? My bench for million (1 to 1000000) loop and 10 measurement, choosing best on i3 4160, 3.60 GHz machine: 7.9392 ms boost cpp_int: 21.7078 ms gmp: 33.8823 ms gmp with change 8.7771 Speedup 3.86 x _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel