sec_invert performance
I'm confused by the pretty bad performance of sec_invert. Try e.g., ./tune/speed -s 1-200 -P invert mpn_gcdext mpn_sec_invert mpz_powm_sec Say we have inputs of n bits. Then my understadning of these algorithms is: * mpn_gcdext, for sizes of interest, is Lehmer's algorithm, O(n^2), with a reasonably small constant since it reduces roughly one limb per iteration. * mpn_sec_powm does n squarings + redc (and then some n/window_size multiplies and table lookups). And the squarings and redc are O(n^2), so total running time should be O(n^3). * mpn_sec_invert has a loop of 2n iterations. Each iterations makes 9 calls to various O(n) functions, like mpn_cnd_add_n, mpn_rshift, etc. So it ought to be O(n^2), and a constant factor slower than mpn_gcdext. But when I run speed, *both* ratios mpn_invert / mpn_gcdext and mpz_sec_powm / mpn_gcdext keep growing with n. So mpn_invert is not a constant factor slower than mpn_gcdext, as I would have expected, and it's significantly slower than mpn_sec_powm for largish sizes like 100 limbs. So it looks like it's O(n^3), not O(n^2). Any idea what's going on? And then, there's also a sharp step at n = 113 or 114 (maybe that's an artifact of the testmachine, but it's reproducible here). $ ./speed -s 100-120 -c -r mpn_gcdext mpn_sec_invert mpz_powm_sec overhead 5.42 cycles, precision 1 units of 6.25e-10 secs, CPU freq 1600.00 MHz mpn_gcdext mpn_sec_invert mpz_powm_sec 100#286753.00 97.9985 68.7822 101#290981.00 96.3812 69.3512 102#297710.00 94.5395 68.9482 103#301772.00 94.9747 69.6444 104#308297.00 94.8706 69.0619 105#310131.00 96.1709 70.7960 106#312842.00 96.6907 70.8039 107#321814.00 96.2753 70.4176 108#313583.00 100.2901 73.2001 109#321223.00 100.3469 72.9247 110#328283.00 99.1086 72.5408 111#333812.00 99.4312 72.9408 112#335619.00 100.7185 73.4388 113#343734.00 46.7804 73.1944 114#352574.00 46.8327 72.9897 115#356299.00 46.9179 73.3826 116#366523.00 46.0239 72.2455 117#371422.00 46.5541 72.7283 118#367404.00 50.3468 74.7026 119#374872.00 48.6011 74.7945 120#376131.00 48.3491 75.4658 Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: sec_invert performance
ni...@lysator.liu.se (Niels Möller) writes: Any idea what's going on? I quick guess is that the exponent is fixed for powm, not a function of the input size. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: sec_invert performance
Torbjorn Granlund t...@gmplib.org writes: I quick guess is that the exponent is fixed for powm, not a function of the input size. Ah, its hardcoded to always use 6 limbs. If I change it to use the same size as b and m, results are a bit different: $ ./speed -s 1-64 -c -r mpn_gcdext mpn_sec_invert mpz_powm_sec overhead 5.43 cycles, precision 1 units of 6.25e-10 secs, CPU freq 1600.00 MHz mpn_gcdext mpn_sec_invert mpz_powm_sec 1#1393.539.25632.8281 2#3700.838.45302.8537 3#5628.829.22964.7198 4#7387.089.87056.5449 5#9270.67 12.22719.4494 6 #11102.89 13.2985 11.9077 7 #12923.54 14.2668 14.8305 8 #14985.50 14.9848 17.2423 9 #16826.96 17.1405 21.4808 10 #18998.00 17.9614 24.7252 11 #20390.00 19.6006 29.7402 12 #23244.21 19.9373 32.0728 13 #24235.58 22.4859 39.6380 14 #26027.70 23.5316 44.4109 15 #29157.89 23.7497 47.7260 16 #30368.62 26.6497 53.3113 17 #32534.43 28.4042 60.1591 18 #33580.67 29.1482 69.0387 19 #34424.60 31.2023 78.6099 20 #36470.00 32.4313 84.8543 21 #36405.25 35.8625 99.2162 22 #41335.25 33.8338 97.9815 23 #39861.67 38.2781 116.4425 24 #41421.67 39.4835 124.2736 25 #44087.67 41.0079 138.0154 26 #48386.33 39.9403 139.1569 27 #40868.00 50.5410 184.1573 So if you know that the modulo is prime (or has some other type of known factorization), using powm actually beats sec_invert for sizes up to 6 limbs. I still puzzled why the ratio between sec_invert and gcdext doesn't approach some constant (I'd expect the ratio to start growing again for sizes above the threshold for subquadratic gcdext). Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: sec_invert performance
Torbjorn Granlund t...@gmplib.org writes: All linear-time functions really take Bn+C time, and quadratic-time functions take An²+Bn+C, of course. If the sec_invert function has small B and/or C while gcdext has larger B and/or C, then the observed pattern seem even more reasonable. Do you agree? Not sure. I had expected the quadratic terms to dominate (and hence the ratio almost converging to the ratio between the n^2 coefficients), well before we cross the GCDEXT_DC_THRESHOLD which is on the order of 500 limbs. But I haven't had any deep thoughts about it. (I guess one could also try to estimate all of A, B and C from the measurements. Putting such numerics into speed is something I'd like to do sometime). Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel