Dear Torbjörn, > IIRC, Svoboda's division trick for N/D is to find a small multiplier m > such that, for the division mN/(mD) we have mD = 100000...XX... with the > high limb of mD being 1000...0. > > This idea works also for 2-adic division. Find m = D^(-1) mod \beta > where \beta is the lomb base. Then do mN/(mD) or mN mod (mD) with > 2-adic norm. Now mD will end in 0...0001, and the quotient limbs will > be the low limbs of the intermediate remainder. > > That should mean that we could use mul_basecase directly for > sbpi1_bdiv_r (or its sibling redc_1). > > Surely this is not a new observation.
yes, see Section 2.4.2 of Modern Computer Arithmetic, where we call it "Montgomery-Svoboda". The quotient selection becomes trivial, which means one can reduce the latency between two mpn_addmul_1 calls. If you reduce k limbs at a time, by precomputing m = D^(-1) mod \beta^k, then you can use mpn_addmul_k at each step. But to reduce the last k limbs, you need to revert to classical (Montgomery) division, since mN has n+k limbs. Paul _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel