thank you Marco for your feedback. At the time I wrote that code I'm not sure mpn_neg did exist... Please tell me if you find a faster version of the code.
Paul > Date: Sun, 26 Apr 2015 22:53:49 +0200 (CEST) > From: bodr...@mail.dm.unipi.it > > Ciao Paul! > > Sorry for the long delay, I've been off-line for a long while... > > Il Mar, 3 Febbraio 2015 7:09 pm, paul zimmermann ha scritto: > > I did update my notes and code about fast inversion (which were used as > > basis for Algorithm 3.5 ApproximateReciprocal in "Modern Computer > > Great news! > > > The main change is that now the result is uniquely defined > > > Comments are welcome. > > After a first glance to the code, two lines surprise me: > mpn_com_n (tp, tp, n); > mpn_add_1 (tp, tp, n, ONE); > I wondered why you didn't use > mpn_neg_n (tp, tp, n); > Then I tested (on shell@gmplib) and... > > @shell ~/gmp-repo$ tune/speed -s 1-1030 -f 2 -c mpn_neg mpn_com > mpn_add_1_inplace.1 > overhead 6.78 cycles, precision 10000 units of 2.86e-10 secs, CPU freq > 3500.08 MHz > mpn_neg mpn_com mpn_add_1_inplace.1 > 1 #5.68 12.54 6.80 > 2 9.40 13.65 #8.19 > 4 16.25 11.40 #8.22 > 8 31.56 16.01 #6.84 > 16 61.86 25.10 #8.16 > 32 139.01 44.79 #6.80 > 64 248.18 85.51 #8.20 > 128 472.77 206.21 #8.38 > 256 918.75 372.29 #8.21 > 512 1915.83 731.53 #6.87 > 1024 3689.67 1472.14 #8.29 > > ...of course you are right. On many architectures we HAVE_NATIVE_mpn_com, > which is faster than the C loop when sizes grow. > > > On the note side: > > The mpn_invert function in GMP currently uses the ApproximateReciprocal > with a final check-and-possibly-correct step triggered by a check of a > single limb. The main difference with respect to your note is that this > step is not performed for all recursion levels, but only once. > > We have two functions: mpn_invert calls mpn_invertappr, the latter returns > a boolean; with your code, something like > return (up[2*h-l-1] + 3 <= CNST_LIMB(2)); /* X might be off by 1 */ > And the former perform the final check if the return value is true. > > > The other difference is that you suggest to use plain multiplication or > the product modulo B^nm+1. On our side we use multiplication modulo B^nm > (to be honest we only truncate linear operations) or wraparound modulo > B^nm-1. I believe the results are the same obtained by the steps you > suggest. And I probably should change the CNST_LIMB(7) used in the last > check with some stricter value. > > We should try if tune/speed is able to detect some speed difference > between the two implementations, probably not... > > Best regards, > Marco > > -- > http://bodrato.it/ _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel