t...@gmplib.org (Torbjörn Granlund) writes: > With some thought, I am sure the table size could come down while the > accuracy could improve.
I see two ways. Either just do an inversion table, which will cost an additional multiplication when used. Than one could handle quotients up to 7 or 8 bits bits with just 256 bytes of table. Or shift both ni and di, and get quotient with something like q = tabp[(ni << NBITS) + di] >> (NBITS - (dcnt - ncnt)) Then one could handle quotients up to 5 bits with a table of 1024 bytes (indexing by 5 significant bits of each of n and d, excluding the most significant one bit). > q0 = tabp[(ni << NBITS) + di]; > r0 = n0 - q0 * d0; > mask = -(mp_limb_t) (r0 >= d0); > q0 -= mask; > r0 -= d0 & mask; You round tabulated q down, and adjust up. I guess one can save one cycle if one instead rounds it up and adjusts down, q0 = tabp[(ni << NBITS) + di]; t = q0 * d0; r0 = n0 - t; mask = -(mp_limb_t) (n0 < t); q0 += mask; r0 += d0 & mask; since computation of mask and initial r0 can then run in parallel (or the same instruction, if the compiler is clever enough to use use carry out). Or is there a risk for overflow in q0 * d0? Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel