[sage-devel] Re: ticket #59 (optimize elliptic curve arithmetic)
I think the problem needs to be profiled. The problem is likely NOT in elliptic curves, but some horrendous chain of calls to module operations before getting to the (same) actual algorithms. Note that P*2 is slightly faster since it calls directly the member function of P rather than a function on integers. sage: E = EllipticCurve([GF(101)(1),3]) sage: P = E([-1,1,1]) sage: timeit 2*P 1000 loops, best of 3: 309 µs per loop sage: timeit P+P 1 loops, best of 3: 89.8 µs per loop sage: timeit P*2 1000 loops, best of 3: 204 µs per loop Yes, reopen it: these sorts of problems need to be looked at and optimized. The same problem applies to points on Jacobians (compare 2*P, P*2, and P+P). --David --~--~-~--~~~---~--~~ To post to this group, send email to sage-devel@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sage-devel URLs: http://sage.scipy.org/sage/ and http://modular.math.washington.edu/sage/ -~--~~~~--~~--~--~---
[sage-devel] Re: ticket #59 (optimize elliptic curve arithmetic)
On Aug 24, 8:36 am, David Kohel [EMAIL PROTECTED] wrote: I think the problem needs to be profiled. The problem is likely NOT in elliptic curves, but some horrendous chain of calls to module operations before getting to the (same) actual algorithms. Note that P*2 is slightly faster since it calls directly the member function of P rather than a function on integers. sage: E = EllipticCurve([GF(101)(1),3]) sage: P = E([-1,1,1]) sage: timeit 2*P 1000 loops, best of 3: 309 µs per loop sage: timeit P+P 1 loops, best of 3: 89.8 µs per loop sage: timeit P*2 1000 loops, best of 3: 204 µs per loop Yes, reopen it: these sorts of problems need to be looked at and optimized. The same problem applies to points on Jacobians (compare 2*P, P*2, and P+P). --David Reopened. The description was changed to optimize elliptic curve arithmetic: 2*P much slower than P+P in order to avoid the snafu we just had. Cheers, Michael --~--~-~--~~~---~--~~ To post to this group, send email to sage-devel@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sage-devel URLs: http://sage.scipy.org/sage/ and http://modular.math.washington.edu/sage/ -~--~~~~--~~--~--~---
[sage-devel] Re: ticket #59 (optimize elliptic curve arithmetic)
Alex Ghitza wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, Hello, I just noticed that this ticket was closed as fixed. To quote trac: - - I guess this has been fixed. With Sage 2.8.2 I get: sage: E = EllipticCurve([GF(101)(1),3]) sage: P = E([-1,1,1]) sage: timeit 2*P 1000 loops, best of 3: 317 µs per loop sage: timeit P+P 1 loops, best of 3: 92.7 µs per loop - - I'm confused: this says that 2*P still takes 3 times as long as P+P to do the same thing -- I thought this was the problem in the first place? After rereading the ticket I see the same problem as you do. I was the one who recommended to William to close that ticket, so my bad. The ratio of P+P to 2*P has actually gotten worse from the time the bug report was filed, even though the operation itself has improved. The question is: Just because something is mathematically equivalent should it take the same time in an actual implementation? How do other CASs fare in this particular case? Can we fix the problem in Sage? And on that note if you go away from the factor 2 and double it fairly soon you reach a crossover point where the situation is reversed: sage: E = EllipticCurve([GF(101)(1),3]) sage: P = E([-1,1,1]) sage: timeit 5*P 1000 loops, best of 3: 444 µs per loop sage: timeit P+P+P+P+P 1000 loops, best of 3: 367 µs per loop sage: timeit 10*P 1000 loops, best of 3: 534 µs per loop sage: timeit P+P+P+P+P+P+P+P+P+P 1000 loops, best of 3: 373 µs per loop sage: timeit 20*P 1000 loops, best of 3: 617 µs per loop sage: timeit P+P+P+P+P+P+P+P+P+P+P+P+P+P+P+P+P+P+P+P 1000 loops, best of 3: 719 µs per loop sage: timeit 40*P 1000 loops, best of 3: 712 µs per loop sage: timeit P+P+P+P+P+P+P+P+P+P+P+P+P+P+P+P+P+P+P+P+P+P+P+P+P+P+P+P+P +P+P+P+P+P+P+P+P+P+P+P 100 loops, best of 3: 1.93 ms per loop Now I know that this is silly to some extend to try this with n=20 or higher. But I guess it illustrates a point. The question now is: Should we reopen the ticket? If so we certainly should change the description. Cheers, Michael Cheers, Alex -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGzheFdZTaNFFPILgRAhJdAJ9PZTrp7IxfCgBQdlV7VNjfqhPm+ACgp9k9 hiR5+zYmUsxKgb9djms3sEk= =hq2x -END PGP SIGNATURE- --~--~-~--~~~---~--~~ To post to this group, send email to sage-devel@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sage-devel URLs: http://sage.scipy.org/sage/ and http://modular.math.washington.edu/sage/ -~--~~~~--~~--~--~---
[sage-devel] Re: ticket #59 (optimize elliptic curve arithmetic)
I think repeated squaring is not more efficient in all cases. For example over Z it is only more efficient if your multiplication algorithm is faster than the naive one (which is O(n^2) in the number of bits). So in the case of elliptic curves it really depends on the efficiency of the addition steps and on how fast the numerators/denominators grow. I am sure the elliptic curve experts on this forum can say more about this. Michel On Aug 14, 10:41 pm, David Harvey [EMAIL PROTECTED] wrote: On Aug 14, 2007, at 4:19 PM, Alex Ghitza wrote: Hi, I've looked at ticket #59, in which David Kohel says: William, my student noticed some slow performance with elliptic curves group law. I think there was a huge overhead in duplication: sage: E = EllipticCurve([GF(101)(1),3]) sage: P = E([-1,1,1]) sage: timeit 2*P 100 loops, best of 3: 3.81 ms per loop sage: timeit P+P 1000 loops, best of 3: 1.81 ms per loop Basically n*P was passing through all sorts of high-level layers for group schemes, abelian groups, and the like. It turns out that n*P is inherited from multiplication in AdditiveGroupElement, which itself goes all the way to ModuleElement. But at that point, it is implemented efficiently (by repeated doubling). It seems to me that the loss of efficiency in comparison with P+P +...+P is all in the overhead. I experimented with this and noticed that P+...+P is faster than n*P as long as n is smaller than 10, then the repeated doubling starts to bear fruit. So inside ell_point.py, I overloaded the multiplication operator, simply making it return P+...+P if n is less than 10, and return n*P if n is larger than 10 (see below). This is not a very elegant solution. For one thing, the magic number 10 that works on my architecture (Intel Core Duo running Gentoo) might not be right for a completely different machine. Is there a way to automatically generate this number when the user compiles sage from source (by starting with a value like 100, say, and then doing a binary search until it hits the optimal number)? Even on a single architecture, the correct crossover point will vary between different base fields. For example if the base field is Q, the best crossover will depend on the size of the coordinates. If the coordinates are enormous, it's probably best to do repeated doubling as much as possible. Similarly if you are working over GF(p) for some very large p, you'll find a different crossover from GF(101). This is going to be a difficult problem in general. I think a better approach is to try to reduce the overhead, i.e. to get the n*P pathway working faster, and/or to reduce the overhead in the addition law itself. david --~--~-~--~~~---~--~~ To post to this group, send email to sage-devel@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sage-devel URLs: http://sage.scipy.org/sage/ and http://modular.math.washington.edu/sage/ -~--~~~~--~~--~--~---
[sage-devel] Re: ticket #59 (optimize elliptic curve arithmetic)
On Aug 15, 2007, at 2:37 AM, Michel wrote: I think repeated squaring is not more efficient in all cases. For example over Z it is only more efficient if your multiplication algorithm is faster than the naive one (which is O(n^2) in the number of bits). So in the case of elliptic curves it really depends on the efficiency of the addition steps and on how fast the numerators/denominators grow. I am sure the elliptic curve experts on this forum can say more about this. Ah yes I had hadn't thought through that. SAGE uses GMP for the multiplications, which is quasi-linear time; on the other hand, the integer GCD code in GMP is still quadratic (as of GMP 4.2.1), so in effect what you're saying is still relevant since it is really doing arithmetic over Q. If you take a non-torsion point P on E(Q), then the bitsize of the numerators/denominators of kP grow like k^2. So bitsize of 2^m P is about 4^m. Say we compute 2^m P by repeated doubling. If the arithmetic is quadratic time then computing 2^m P costs about 1 + 4^2 + 16^2 + ... + (4^m)^2 which is about 4^(2m) = 16^m. If the arithmetic is linear time (including the GCDs) then computing 2^m P costs about 1 + 4 + 16 + ... + 4^m which is about 4^m (ignoring log factors). Now say we compute 2^m P as (P + (P + (P + ( ... (P + P))...))). Well I'm not sure exactly what happens here, it's a bit more complicated since at each stage you're adding something small to something big. I guess to keep it simple we can pretend the small thing is a constant, since we're assuming P is fixed. In the linear-time arithmetic case, at step N you have to do at least N^2 work, so it's 1 + 2^2 + 3^2 + ... + (2^m)^2 = about 8^m, so definitely loses to repeated doubling. In the quadratic time arithmetic case, it's more complicated. I'm looking at the formula for elliptic curve addition, and it looks like to compute P+Q you have to multiply coordinates of Q by coordinates of Q. Maybe there's a way of rewriting it, I don't know. If this is true, it means step N needs at least N^4 work, so it's 1 + 2^4 + 3^4 + ... + (2^m)^4 = about 32^m, so this still loses out to repeated doubling. If on the other hand, you can get away with multiplying coordinates of Q by coordinates of P (which is assumed to be constant), then it's still 8^m, so it beats repeated doubling. So if my analysis is wrong, we need an elliptic curve expert; and if it's right, we need an elliptic curve expert! (and in all cases it would be great if someone could address this ticket: http://www.sagemath.org:9002/sage_trac/ticket/424) david --~--~-~--~~~---~--~~ To post to this group, send email to sage-devel@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sage-devel URLs: http://sage.scipy.org/sage/ and http://modular.math.washington.edu/sage/ -~--~~~~--~~--~--~---
[sage-devel] Re: ticket #59 (optimize elliptic curve arithmetic)
On Aug 14, 2007, at 4:19 PM, Alex Ghitza wrote: Hi, I've looked at ticket #59, in which David Kohel says: William, my student noticed some slow performance with elliptic curves group law. I think there was a huge overhead in duplication: sage: E = EllipticCurve([GF(101)(1),3]) sage: P = E([-1,1,1]) sage: timeit 2*P 100 loops, best of 3: 3.81 ms per loop sage: timeit P+P 1000 loops, best of 3: 1.81 ms per loop Basically n*P was passing through all sorts of high-level layers for group schemes, abelian groups, and the like. It turns out that n*P is inherited from multiplication in AdditiveGroupElement, which itself goes all the way to ModuleElement. But at that point, it is implemented efficiently (by repeated doubling). It seems to me that the loss of efficiency in comparison with P+P +...+P is all in the overhead. I experimented with this and noticed that P+...+P is faster than n*P as long as n is smaller than 10, then the repeated doubling starts to bear fruit. So inside ell_point.py, I overloaded the multiplication operator, simply making it return P+...+P if n is less than 10, and return n*P if n is larger than 10 (see below). This is not a very elegant solution. For one thing, the magic number 10 that works on my architecture (Intel Core Duo running Gentoo) might not be right for a completely different machine. Is there a way to automatically generate this number when the user compiles sage from source (by starting with a value like 100, say, and then doing a binary search until it hits the optimal number)? Even on a single architecture, the correct crossover point will vary between different base fields. For example if the base field is Q, the best crossover will depend on the size of the coordinates. If the coordinates are enormous, it's probably best to do repeated doubling as much as possible. Similarly if you are working over GF(p) for some very large p, you'll find a different crossover from GF(101). This is going to be a difficult problem in general. I think a better approach is to try to reduce the overhead, i.e. to get the n*P pathway working faster, and/or to reduce the overhead in the addition law itself. david --~--~-~--~~~---~--~~ To post to this group, send email to sage-devel@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sage-devel URLs: http://sage.scipy.org/sage/ and http://modular.math.washington.edu/sage/ -~--~~~~--~~--~--~---