[sage-devel] Re: ticket #59 (optimize elliptic curve arithmetic)

2007-08-24 Thread David Kohel

I think the problem needs to be profiled.  The problem is likely NOT
in elliptic curves, but some
horrendous chain of calls to module operations before getting to the
(same) actual algorithms.
Note that P*2 is slightly faster since it calls directly the member
function of P rather than a
function on integers.

sage: E = EllipticCurve([GF(101)(1),3])
sage: P = E([-1,1,1])
sage: timeit 2*P
1000 loops, best of 3: 309 µs per loop
sage: timeit P+P
1 loops, best of 3: 89.8 µs per loop
sage: timeit P*2
1000 loops, best of 3: 204 µs per loop

Yes, reopen it: these sorts of problems need to be looked at and
optimized.  The same problem
applies to points on Jacobians (compare 2*P, P*2, and P+P).

--David



--~--~-~--~~~---~--~~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://sage.scipy.org/sage/ and http://modular.math.washington.edu/sage/
-~--~~~~--~~--~--~---



[sage-devel] Re: ticket #59 (optimize elliptic curve arithmetic)

2007-08-24 Thread mabshoff

On Aug 24, 8:36 am, David Kohel [EMAIL PROTECTED] wrote:
 I think the problem needs to be profiled.  The problem is likely NOT
 in elliptic curves, but some
 horrendous chain of calls to module operations before getting to the
 (same) actual algorithms.
 Note that P*2 is slightly faster since it calls directly the member
 function of P rather than a
 function on integers.

 sage: E = EllipticCurve([GF(101)(1),3])
 sage: P = E([-1,1,1])
 sage: timeit 2*P
 1000 loops, best of 3: 309 µs per loop
 sage: timeit P+P
 1 loops, best of 3: 89.8 µs per loop
 sage: timeit P*2
 1000 loops, best of 3: 204 µs per loop

 Yes, reopen it: these sorts of problems need to be looked at and
 optimized.  The same problem
 applies to points on Jacobians (compare 2*P, P*2, and P+P).

 --David

Reopened. The description was changed to optimize elliptic curve
arithmetic: 2*P much slower than P+P in order to avoid the snafu we
just had.

Cheers,

Michael


--~--~-~--~~~---~--~~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://sage.scipy.org/sage/ and http://modular.math.washington.edu/sage/
-~--~~~~--~~--~--~---



[sage-devel] Re: ticket #59 (optimize elliptic curve arithmetic)

2007-08-23 Thread mabshoff


Alex Ghitza wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Hi,

Hello,


 I just noticed that this ticket was closed as fixed.  To quote trac:

 - -
 I guess this has been fixed. With Sage 2.8.2 I get:

 sage: E = EllipticCurve([GF(101)(1),3])
 sage: P = E([-1,1,1])
 sage: timeit 2*P
 1000 loops, best of 3: 317 µs per loop
 sage: timeit P+P
 1 loops, best of 3: 92.7 µs per loop
 - -

 I'm confused: this says that 2*P still takes 3 times as long as P+P to
 do the same thing -- I thought this was the problem in the first place?

After rereading the ticket I see the same problem as you do. I was the
one who recommended to William to close that ticket, so my bad. The
ratio of P+P to 2*P has actually gotten worse from the time the
bug report was filed, even though the operation itself has improved.

The question is: Just because something is mathematically equivalent
should it take the same time in an actual implementation? How do other
CASs fare in this particular case? Can we fix the problem in Sage?

And on that note if you go away from the factor 2 and double it fairly
soon you reach a crossover point where the situation is reversed:

sage: E = EllipticCurve([GF(101)(1),3])
sage: P = E([-1,1,1])

sage: timeit 5*P
1000 loops, best of 3: 444 µs per loop
sage: timeit P+P+P+P+P
1000 loops, best of 3: 367 µs per loop

sage: timeit 10*P
1000 loops, best of 3: 534 µs per loop
sage: timeit P+P+P+P+P+P+P+P+P+P
1000 loops, best of 3: 373 µs per loop

sage: timeit 20*P
1000 loops, best of 3: 617 µs per loop
sage: timeit P+P+P+P+P+P+P+P+P+P+P+P+P+P+P+P+P+P+P+P
1000 loops, best of 3: 719 µs per loop

sage: timeit 40*P
1000 loops, best of 3: 712 µs per loop
sage: timeit P+P+P+P+P+P+P+P+P+P+P+P+P+P+P+P+P+P+P+P+P+P+P+P+P+P+P+P+P
+P+P+P+P+P+P+P+P+P+P+P
100 loops, best of 3: 1.93 ms per loop

Now I know that this is silly to some extend to try this with n=20 or
higher. But I guess it illustrates a point.

The question now is: Should we reopen the ticket? If so we certainly
should change the description.

Cheers,

Michael

 Cheers,
 Alex

 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.7 (GNU/Linux)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

 iD8DBQFGzheFdZTaNFFPILgRAhJdAJ9PZTrp7IxfCgBQdlV7VNjfqhPm+ACgp9k9
 hiR5+zYmUsxKgb9djms3sEk=
 =hq2x
 -END PGP SIGNATURE-


--~--~-~--~~~---~--~~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://sage.scipy.org/sage/ and http://modular.math.washington.edu/sage/
-~--~~~~--~~--~--~---



[sage-devel] Re: ticket #59 (optimize elliptic curve arithmetic)

2007-08-15 Thread Michel

I think repeated squaring is not more efficient in all cases. For
example over Z it is only
more efficient if your multiplication algorithm is faster than the
naive one (which is O(n^2) in the number of bits).

So in the case of elliptic curves it really depends on the efficiency
of the addition steps and on how fast
the numerators/denominators grow. I am sure the elliptic curve experts
on this forum can say more about this.

Michel


On Aug 14, 10:41 pm, David Harvey [EMAIL PROTECTED] wrote:
 On Aug 14, 2007, at 4:19 PM, Alex Ghitza wrote:



  Hi,

  I've looked at ticket #59, in which David Kohel says:

William, my student noticed some slow performance with elliptic
  curves
group law. I think there was a huge overhead in duplication:

sage: E = EllipticCurve([GF(101)(1),3])
sage: P = E([-1,1,1])
sage: timeit 2*P
100 loops, best of 3: 3.81 ms per loop
sage: timeit P+P
1000 loops, best of 3: 1.81 ms per loop

Basically n*P was passing through all sorts of high-level layers for
group schemes, abelian groups, and the like.

  It turns out that n*P is inherited from multiplication in
  AdditiveGroupElement, which itself goes all the way to ModuleElement.
  But at that point, it is implemented efficiently (by repeated
  doubling).
   It seems to me that the loss of efficiency in comparison with P+P
  +...+P
  is all in the overhead.

  I experimented with this and noticed that P+...+P is faster than
  n*P as
  long as n is smaller than 10, then the repeated doubling starts to
  bear
  fruit.  So inside ell_point.py, I overloaded the multiplication
  operator, simply making it return P+...+P if n is less than 10, and
  return n*P if n is larger than 10 (see below).

  This is not a very elegant solution.  For one thing, the magic
  number 10
  that works on my architecture (Intel Core Duo running Gentoo) might
  not
  be right for a completely different machine.  Is there a way to
  automatically generate this number when the user compiles sage from
  source (by starting with a value like 100, say, and then doing a
  binary
  search until it hits the optimal number)?

 Even on a single architecture, the correct crossover point will vary
 between different base fields.

 For example if the base field is Q, the best crossover will depend on
 the size of the coordinates. If the coordinates are enormous, it's
 probably best to do repeated doubling as much as possible. Similarly
 if you are working over GF(p) for some very large p, you'll find a
 different crossover from GF(101).

 This is going to be a difficult problem in general. I think a better
 approach is to try to reduce the overhead, i.e. to get the n*P
 pathway working faster, and/or to reduce the overhead in the addition
 law itself.

 david


--~--~-~--~~~---~--~~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://sage.scipy.org/sage/ and http://modular.math.washington.edu/sage/
-~--~~~~--~~--~--~---



[sage-devel] Re: ticket #59 (optimize elliptic curve arithmetic)

2007-08-15 Thread David Harvey


On Aug 15, 2007, at 2:37 AM, Michel wrote:

 I think repeated squaring is not more efficient in all cases. For
 example over Z it is only
 more efficient if your multiplication algorithm is faster than the
 naive one (which is O(n^2) in the number of bits).

 So in the case of elliptic curves it really depends on the efficiency
 of the addition steps and on how fast
 the numerators/denominators grow. I am sure the elliptic curve experts
 on this forum can say more about this.

Ah yes I had hadn't thought through that.

SAGE uses GMP for the multiplications, which is quasi-linear time; on  
the other hand, the integer GCD code in GMP is still quadratic (as of  
GMP 4.2.1), so in effect what you're saying is still relevant since  
it is really doing arithmetic over Q.

If you take a non-torsion point P on E(Q), then the bitsize of the  
numerators/denominators of kP grow like k^2. So bitsize of 2^m P is  
about 4^m.

Say we compute 2^m P by repeated doubling.  If the arithmetic is  
quadratic time then computing 2^m P costs about 1 + 4^2 + 16^2 + ...  
+ (4^m)^2 which is about 4^(2m) = 16^m. If the arithmetic is linear  
time (including the GCDs) then computing 2^m P costs about 1 + 4 + 16  
+ ... + 4^m which is about 4^m (ignoring log factors).

Now say we compute 2^m P as (P + (P + (P + ( ... (P + P))...))). Well  
I'm not sure exactly what happens here, it's a bit more complicated  
since at each stage you're adding something small to something  
big. I guess to keep it simple we can pretend the small thing is a  
constant, since we're assuming P is fixed. In the linear-time  
arithmetic case, at step N you have to do at least N^2 work, so it's  
1 + 2^2 + 3^2 + ... + (2^m)^2 = about 8^m, so definitely loses to  
repeated doubling.

In the quadratic time arithmetic case, it's more complicated. I'm  
looking at the formula for elliptic curve addition, and it looks like  
to compute P+Q you have to multiply coordinates of Q by coordinates  
of Q. Maybe there's a way of rewriting it, I don't know. If this is  
true, it means step N needs at least N^4 work, so it's 1 + 2^4 + 3^4  
+ ... + (2^m)^4 = about 32^m, so this still loses out to repeated  
doubling. If on the other hand, you can get away with multiplying  
coordinates of Q by coordinates of P (which is assumed to be  
constant), then it's still 8^m, so it beats repeated doubling.

So if my analysis is wrong, we need an elliptic curve expert; and if  
it's right, we need an elliptic curve expert!

(and in all cases it would be great if someone could address this  
ticket: http://www.sagemath.org:9002/sage_trac/ticket/424)

david


--~--~-~--~~~---~--~~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://sage.scipy.org/sage/ and http://modular.math.washington.edu/sage/
-~--~~~~--~~--~--~---



[sage-devel] Re: ticket #59 (optimize elliptic curve arithmetic)

2007-08-14 Thread David Harvey


On Aug 14, 2007, at 4:19 PM, Alex Ghitza wrote:

 Hi,

 I've looked at ticket #59, in which David Kohel says:


   William, my student noticed some slow performance with elliptic  
 curves
   group law. I think there was a huge overhead in duplication:

   sage: E = EllipticCurve([GF(101)(1),3])
   sage: P = E([-1,1,1])
   sage: timeit 2*P
   100 loops, best of 3: 3.81 ms per loop
   sage: timeit P+P
   1000 loops, best of 3: 1.81 ms per loop

   Basically n*P was passing through all sorts of high-level layers for
   group schemes, abelian groups, and the like.


 It turns out that n*P is inherited from multiplication in
 AdditiveGroupElement, which itself goes all the way to ModuleElement.
 But at that point, it is implemented efficiently (by repeated  
 doubling).
  It seems to me that the loss of efficiency in comparison with P+P 
 +...+P
 is all in the overhead.

 I experimented with this and noticed that P+...+P is faster than  
 n*P as
 long as n is smaller than 10, then the repeated doubling starts to  
 bear
 fruit.  So inside ell_point.py, I overloaded the multiplication
 operator, simply making it return P+...+P if n is less than 10, and
 return n*P if n is larger than 10 (see below).

 This is not a very elegant solution.  For one thing, the magic  
 number 10
 that works on my architecture (Intel Core Duo running Gentoo) might  
 not
 be right for a completely different machine.  Is there a way to
 automatically generate this number when the user compiles sage from
 source (by starting with a value like 100, say, and then doing a  
 binary
 search until it hits the optimal number)?

Even on a single architecture, the correct crossover point will vary  
between different base fields.

For example if the base field is Q, the best crossover will depend on  
the size of the coordinates. If the coordinates are enormous, it's  
probably best to do repeated doubling as much as possible. Similarly  
if you are working over GF(p) for some very large p, you'll find a  
different crossover from GF(101).

This is going to be a difficult problem in general. I think a better  
approach is to try to reduce the overhead, i.e. to get the n*P  
pathway working faster, and/or to reduce the overhead in the addition  
law itself.

david


--~--~-~--~~~---~--~~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://sage.scipy.org/sage/ and http://modular.math.washington.edu/sage/
-~--~~~~--~~--~--~---