t...@gmplib.org (Torbjörn Granlund) writes: > Not exactly. I think running div1 like hgcd2 but without any of hgcd2 > bookkeeping would make some sense. I.e., feed div1 with the input of > Euclid's algorithm. To avoid skew from particular operands, perhaps > table 10 uniformly distrinuted random numbers, then do all (a,b) pairs > from that table. > > The idea is to make measurements more accurate.
Makes sense. If better accuracy is needed. > I've written several div1 in asm (arm v5 method 2, 64-bit arm v8 method > 2 and 3, 64-bit x86 method 2). Nice. > Since these will not be static or inlined, I call them mpn_div_11. I > am not sure how to time them within your framework. I think adding a case #if HAVE_NATIVE_MPN_DIV_11 #define div1 mpn_div_11 #else ... other versions depending on HGCD2_DIV1_METHOD #endif in hgcd2.c, and then using ./tune/speed mpn_hgcd2, should work. To not break tuneup of the C versions, one would need to amend that to #if HAVE_NATIVE_MPN_DIV_11 && !TUNING_BUILD and add a #define TUNING_BUILD 1 to the various tune/hgcd2_*.c files. (Or maybe some other define than TUNING_BUILD, since this is meant to affect both the tuneup and the speed programs). > I now believe a table-based thing which gets the quotient right > 90% of > the time will be the best approach. What should the table look like? If one looks up only the divisor bits, (i.e., a reciprocal table) one gets better precision with smaller table size. If one looks up bits of both inputs, table gets larger. One would then tabulate something like 1uuuuu / 1.ddddd, and shift the result depending on the bit size difference. > And perhaps hgcd2 could be made to cope with inaccurate quotients, and > that without itself executing unpredictable branches? As long as > progress is made, perhaps sloppy quotients can be tolerated? After a' = a - q*b, we currently expect 0 <= a' < b (and exit if a' is close to zero). Allowing a' < 0 is not an easy change; one needs an absolute value, and maybe a way to deal with negative matrix elements. Allowing a' > b means that next step needs another a' - b, not b - a'. To handle that, we need a conditional swap of four pairs of numbers (three pairs in the single-limb loop). And it has to be rare enough to not increase the number of iterations. It's not obvious to me that this can be faster than doing a quotient adjustment in the current iteration. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel