ni...@lysator.liu.se (Niels Möller) writes: > * It might be possible to implement this cheaper than udiv_qr_3by2, > omitting most of the book-keeping and carry propagation involving the > low remainder word r0.
I've made some progress, not yet clear if it will work out. I think I can prove the following, analogous to the theorems in the paper: Let d = {d_1, d_0} be the normalized divisor, B^2 / 2 <= d < B^2. Let v be the same inverse as for 3/2, v = floor ((B^3-1) / d) - B Let {u_2, u_1} < d be the top limbs of the numerator. Compute the candidate quotient + fraction, in the same way as for 3/2: {q_1, q_0} = (v+B) u_2 + u_1 = v u_2 + {u_2, u_1} Let r be the candidate remainder, almost single word r = {u_2, u_1} - floor ((q_1 + 1) d / B) Then r lies in the range c - B = r < c, with c = max(B - d_1, q_0) Hence, r is uniquely determined by r mod B, q_0 and d_1. Still unclear how well it can work out. To guarantee that we produce a q which never is too small for schoolbook division, we might need to add some adjustment term, like r = {u_2, u_1} - floor (((q_1 + 1) d + B - 1)/ B) and then we're almost back to computing the next remainder limb. Another alternative might be arrange so that the final q corresponds to a remainder in the range -1 <= r < d_1 In this case, maybe it's best to return only q, not r (unless we can find a nice way to handle the -1 case in the schoolbook division loop). Starting with q = q_1 + 1, p_0 = q*d_0 (mod B), and r computed only mod B, the adjustment steps would be something like if (r >= q_0) /* Needs to be branch free */ { q--; r += d_1 + (d_0 > p_0); p_0 -= d_0; } if (r >= d_1) /* Unlikely */ { q++; p_0 += d_0; r -= d_1 + (p_0 < d_0); /* Might underflow */ } assert (p_0 == q * d_0); return q; /* And also return r and p_0, when we've went to the trouble to update them. */ The comparisons involving d_0 and p_0 in the r update are needed to update the contribution from mulhi(q*d_0) when we increment and decrement q. Maybe we don't need them if we only return q, and leave to the caller to recompute r (e.g, doing a submul_1 on the complete unnormalized operands). If we can get away with that, it gets a lot simpler, if (r >= q_0) /* Needs to be branch free */ { q--; r += d_1; } return q + (r >= d_1); Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel