David Miller <da...@davemloft.net> writes: This turned out to be easy, you were using %o5 as a register for 'dinv' but this gets clobbered elsewhere in the code, using %o4 instead fixes the problems. Well, I suppose that was another of my "safe" last-minute fixes. :-)
Attached is a dive_1.asm that works for me on real hardware as well as T4 timings from: tune/speed -p10000000 -s1-1000 -f1.1 -C mpn_divexact_1.3 Terrible speed, as expected on these machines for code that relies on mul *latency*. We will need to compute d^(-1) mod B^2 (or B^k, k > 2) where B is the limb base. With such an inverse, we will develop k quotient limbs at a time, using several *independent* limb multiplies. There is an bdiv_qr_1_pi2 lurking, which does this for k = 2. -- Torbjörn _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel