[sage-devel] Re: Enumeration of totally real fields, continued

William Stein Thu, 13 Sep 2007 21:08:03 -0700

On 9/13/07, John Voight <[EMAIL PROTECTED]> wrote:
>
> Hi Will,
>
> > > (2) I have my Cython code in a tr_data.spyx file and my Python code in
> > > a totallyreal.py file.  How do I include the former into the latter--
> > > or do I have to include both at the prompt?
> >
> > You can't trivially get at tr_data.spyx from totallyreal.py, but you can
> > from a .sage file, as illustrated in this example:
> >
> > [EMAIL PROTECTED]:~/tmp$ more a.py
> > def foo(n):
> >    return n*n
> >
> > [EMAIL PROTECTED]:~/tmp$ more b.spyx
> > def bar(m):
> >     return m+m
> > [EMAIL PROTECTED]:~/tmp$ more c.sage
> > load a.py
> > load b.spyx
> >
> > print foo(10)
> > print bar(20)
> [...]
>
> Got it.  I'd like to report a concern here.  If you name the file
> a.sage, then when you load a.sage, it *deletes* a.py and replaces it
> with an automatically generated file!  I did this and had to recover
> my file from a backup!
>
> (Incidentally, when I log into to trac to report such things, the
> username I have on sage.math doesn't allow me in; will you register
> me?)
>
> > > While you're at it, I need to be able to run an LLL-optimization
> > > routine to speed up these functions: the function is called polred().
>
> Just so this doesn't get forgotten--I think we could use an equivalent
> to OptimizedRepresentation(K) which runs an LLL on a maximal order Z_K
> under the usual Minkowski embedding into R^n.
>
> > > (7) What overhead happens in declaring a number field?
> >
> > Basically nothing.  The polynomial is checked for irreducibility and
> > that it is monic.  If you do
> >
> > sage: x = polygen(QQ,'x')
> > sage: %prun for i in range(100): k = NumberField(x**3 + x + 1393923489 + 
> > i,'a')
> >
> > you'll see that the time is dominated by coercing numbers into the 
> > rationals.
> > It's definitey not calling pari's nfinit at all.
>
> OK, if I treat a slightly different example:
>
> sage: def dumb_list_fields(B):
> ....:     for d in range(2,B):
> ....:         f = ZZx([d,0,1])
> ....:         K = NumberField(f, 't')
> ....:
> sage: time dumb_list_fields(10^4)
> CPU times: user 5.70 s, sys: 0.08 s, total: 5.78 s
> Wall time: 5.78
> sage: time dumb_list_fields(10^5)
> CPU times: user 56.72 s, sys: 0.88 s, total: 57.60 s
> Wall time: 57.57
>
> In Magma, these take 1/10 the time:
>
> > procedure DumbListFields(B);
> procedure> for d := 2 to B do
> procedure|for> f := Polynomial([d,0,1]);
> procedure|for> K := NumberField(f);
> procedure|for> end for;
> procedure> end procedure;
> > time DumbListFields(10^4);
> Time: 0.550
> > time DumbListFields(10^5);
> Time: 5.490
>
> That's a huge yikes for me.  (For your cubic examples, it's even
> worse: 1.57 s versus 0.020 s.)
>
> When I prun, I get:
> sage: prun dumb_list_fields(10^4)
>          949814 function calls in 9.019 CPU seconds
>
>    Ordered by: internal time
>
>    ncalls  tottime  percall  cumtime  percall
> filename:lineno(function)
>     59988    2.017    0.000    2.151    0.000 rational_field.py:
> 120(__call__)
>      9998    1.060    0.000    3.221    0.000
> polynomial_element_generic.py:519(__init__)
>      9998    0.808    0.000    8.314    0.001 number_field.py:
> 157(NumberField)
>     19996    0.744    0.000    4.347    0.000 polynomial_ring.py:
> 163(__call__)
> [...]
>
> So there's no hope: there's just overhead in the coercion here?


Nobody has ever spent any effort on optimizing the following in SAGE:
   - number field construction
   - If f is a polynomial over ZZ, constructing the corresponding
     polynomial over QQ (which is taking a substantial chunk of the time)

Here's an example of doing the above but 10 times faster
by circumventing some of the slow parts:

{{{id=0|
ZZx = ZZ['x']
QQx = QQ['x']
from sage.rings.number_field.number_field import NumberField_generic

def dumb_list_fields(B):
     for d in range(2,B):
         f = ZZx([d,0,1])
         #f = QQx(f)
         K = NumberField_generic(f, 'a', check=False, latex_name='a')
     return K
}}}

{{{id=1|
time k = dumb_list_fields(10^4)
///
CPU time: 0.62 s,  Wall time: 0.63 s
}}}

Note that with the line
   # f = QQx(f)
commented out, the f is a poly over ZZ instead of QQ, so some
functions on K simply won't work (though many will -- maybe enouhg
for you).

It is definitely possible to make
  f = QQx(f)
or better
   f = f.change_ring(QQ)
basically instant by just adding some code in the right place
written in Cython against the GMP C library.  I would do it
now, but I don't have the time (because of grant proposals).

> Perhaps in my limited situation, there is the workaround of calling
> the pari functions polisirred() and nfisisom() directly on the
> sequences--but will this have any less overhead?

If you do it right it will likely have much less overhead.

> > > Unless I am
> > > reading this incorrectly, it now dominates my computation now!
> >
> > No, pari's nfinit is dominated your computation.  That's maybe
> > because you might be using K1._pari_().nfisisom(K2) to check
> > isomorphism.  Doing  K1._pari_() creates the PARI number field
> > associated to K1, which is a very expensive operation:
> [...]
> > With the new K1.is_isomorphic(K2) code I just posted this
> > is completely removed, since it just calls paris nfisisom code
> > with the two polynomials that define the fields as input.
>
> As for the mysterious nfinit, even after I changed to the
> is_isomorphic call (thanks so much for spending those few minutes!), I
> found that calling discriminant() still apparently gets the full brunt
> of an nfinit:

That can easily be changed.  Instead of saying "apparently", you should
type
   K.discriminant??
and look at the source code:
                   self.__disc = QQ(str(self.pari_nf()[2]))
You see that self.pari_nf() is being called.
Changing this to

           self.__disc = QQ(str(self.pari_polynomial().nfdisc() ))

would vastly speed this up... that's sufficiently easy so I've done it.

>
> sage: def dumber_list_fields(B):
> ....:     for d in range(2,B):
> ....:         f = ZZx([d,0,1])
> ....:         K = NumberField(f, 't')
> ....:         K.discriminant()
> ....:
> sage: prun dumber_list_fields(10^4)
>          1269750 function calls in 18.327 CPU seconds
>
>    Ordered by: internal time
>
>    ncalls  tottime  percall  cumtime  percall
> filename:lineno(function)
>      9998    4.113    0.000    4.113    0.000 {method 'nfinit' of
> 'sage.libs.pari.gen.gen' objects}
>    129974    2.363    0.000    2.672    0.000 rational_field.py:
> 120(__call__)
>      9998    1.401    0.000    2.912    0.000 {method '_pari_' of
> 'sage.rings.polynomial.polynomial_element.Polynomial' objects}
>      9998    1.107    0.000    1.941    0.000
> polynomial_element_generic.py:519(__init__)
>     19996    1.020    0.000    3.390    0.000 polynomial_ring.py:
> 163(__call__)
>     19996    0.910    0.000    2.707    0.000
> polynomial_element_generic.py:879(list)
>
> And now Magma wins by a much larger factor:
>
> > procedure DumberListFields(B);
> procedure> for d := 2 to B do
> procedure|for> f := Polynomial([d,0,1]);
> procedure|for> K := NumberField(f);1]);
> procedure|for> _ := Discriminant(K);
> procedure|for> end for;
> procedure> end procedure;
> > time DumberListFields(10^4);
> Time: 0.550
>
> Any ideas here?  Presumably, one can calculate a maximal order and
> hence the discriminant without calculating all of the other sundry
> invariants.

With the 1-line change I mentioned above plus the optimized
creation SAGE takes 2 seconds to do this:

{{{id=0|
ZZx = ZZ['x']
QQx = QQ['x']
from sage.rings.number_field.number_field import NumberField_generic

def dumb_list_fields(B):
     for d in range(2,B):
         f = ZZx([d,0,1])
         #f = QQx(f)
         K = NumberField_generic(f, 'a', check=False, latex_name='a')
         D = K.discriminant()
}}}

{{{id=1|
time k = dumb_list_fields(10^4)
///
CPU time: 2.14 s,  Wall time: 2.16 s
}}}

Doing this just using PARI via SAGE takes < 0.5 seconds:

{{{id=9|
ZZx = ZZ['x']
QQx = QQ['x']
from sage.rings.number_field.number_field import NumberField_generic

def dumb_list_fields(B):
     for d in range(2,B):
         D = pari('x^2 - %s'%d).nfdisc()
}}}

{{{id=2|
time k = dumb_list_fields(10^4)
///
CPU time: 0.49 s,  Wall time: 0.50 s
}}}


> All in all, for what I've written, SAGE is about 3 times slower than
> Magma, and this is a real problem if in the end my code will take one
> month to run instead of three months!
>
> Thanks again for your help!

Nothing in SAGE is fast until somebody writes some real code that
uses it and then it gets optimized.  Exactly the same is true of Magma,
except that Magma has been through this cycle a lot more in over
the past 20 years.    The potential for how fast things can be in either
systems is fairly similar.  Actually perhaps in the long run SAGE has an
advantage because there are more SAGE developers that get to
look under the hood (at the compiled level) than with magma, SAGE
make use of any GPL'd code and Magma can't, and there's a huge
amount of  GPL'd code out there...

In short, without people like you doing what you're doing with
SAGE right now, it'll never get better.  Thanks!!

William

--~--~---------~--~----~------------~-------~--~----~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://sage.scipy.org/sage/ and http://modular.math.washington.edu/sage/
-~----------~----~----~----~------~----~------~--~---

[sage-devel] Re: Enumeration of totally real fields, continued

Reply via email to