[sage-devel] Re: Implementation of Q(t) and monomials

Bill Hart Mon, 16 Nov 2009 04:42:46 -0800

On 15 Nov, 05:32, William Stein <wst...@gmail.com> wrote:
> On Sat, Nov 14, 2009 at 9:00 PM, Sebastian Pancratz <s...@pancratz.org> wrote:
>
> > Dear William,
>
> > I am adding a somewhat more detailed performance report below,
> > comparing my own FLINT-based C code, MAGMA, SAGE 4.1.2.alpha2 and SAGE
> > 4.1.2.alpha2 with the patch from trac ticket #4000.
>
> > To give an impression of the results, for very small examples MAGMA
> > marginally beats my C implementation, but I am very confident that I
> > could change this around using a lazy evaluation approach for changing
> > values away from 0/1 (then to be implemented as NULL/NULL).  At this
> > stage already, the regular (resp. patched) SAGE code is out by a
> > factor of 100 (resp. 10).  In the last example with degrees 100 and
> > 140, the C code beats MAGMA by a factor of close to 10 and the regular
> > SAGE code is a factor of around 150,000 slower (for addition; the I
> > did not wait for the multiplication to finish), that is to say,
> > essentially unusable for repeated calculations.  Fortunately, at this
> > stage, the patched SAGE code is less than a factor of 2 slower than
> > the C code and clearly beats the MAGMA code.
>
> > By the way, I wouldn't want to say that the C code is heavily
> > optimised at the moment.  When writing it, I have focussed on writing
> > a clean interface while adding as little overhead as possible, thereby
> > taking advantage of the speed that FLINT provides.  Nonetheless, apart
> > from using the lazy evaluation approach mentioned above and trying to
> > improve the combined operation addmul (used for matrix operations), I
> > do not have any ideas for improvement at the moment.  But of course,
> > I'd be happy to incorporate any suggestions.
>
> > As a final comment, I want to add that the performance of the patched
> > SAGE code shouldn't be a surprise.  As part of the work on trac ticket
> > #4000, I had already written a class
> > "UnivariateRationalRationalFunction(FractionFieldElement)" to
> > represent a rational function over the rationals, based on Z[t], which
> > in turn is based on FLINT already.  It is thus to be expected that it
> > should run at a similar speed apart from a small constant factor.
>
> > Kind regards,
>
> > Sebastian
>
> > [Performance comparison]
> > Performance comparison between
>
> >    FLINT-based C code,
> >    MAGMA (Magma V2.15-12),
> >    SAGE (4.1.2.alpha2),
> >    SAGE (4.1.2.alpha2 with changes along trac ticket #4000).
>
> > MAGMA was run on the departmental machine, with eight CPUs with the
> > following specifications
>
> >    model name      : Dual-Core AMD Opteron(tm) Processor 8220
> >    cpu MHz         : 1000.000
>
> Why do you say "1000" MHZ when that particular processor is a 2800Mhz
> (=2.8Ghz) processor?
>
> >    cache size      : 1024 KB
>
> > and 32GB of memory.
>
> > The other three programs were run on my laptop, with two CPUs with the
> > following specifications
>
> >    model name      : Intel(R) Core(TM)2 Duo CPU     P8400  @ 2.26GHz
> >    cpu MHz         : 800.000
>
> That cpu MHz of 800 makes no sense, given that your CPU is 2.26GHz.
> I'm just curious.
>
>
>
>
>
> >    cache size      : 3072 KB
>
> > and 2GB of memory.
>
> > We only compare addition and multiplication, based on three examples
> > of different sizes.  The number of repetitions only applies to the
> > FLINT-based C code and MAGMA.  We give the output of "timeit" for SAGE
> > and scale the number up by the number of repetitions to make it easily
> > comparable.
>
> > The C timings are done using setup in the file qfmpz_poly-testp.
>
> > The MAGMA code is executed as follows:
>
> >    F := FieldOfFractions(PolynomialRing(Rationals()));
> >    t := F.1;
> >    b := (3*t^2+2*t+1)/(t+10);
> >    c := (5*t+7)/13;
> >    t0 := Cputime();
> >    for i:=1 to 1000000 do
> >    for> a := b+c;
> >    for> end for;
> >    t1 := Cputime(t0);
> >    t1;
>
> > The SAGE code is time with
>
> >    F.<t> = Frac(PolynomialRing(QQ, 't'))
> >    b = (3*t^2+2*t+1)/(t+10)
> >    c = (5*t+7)/13
> >    timeit('a = b+c')
>
> > In each case, we give the timings in the same order as the name of the
> > programs at the top of this text.
>
> > In the small case, we use N = 1,000,000 repetitions and
>
> > b = (3*t^2+2*t+1)/(t+10)
> > c = (5*t+7)/13
>
> > ADD
> > 3.41
> > 3.010
> > 263 (625 loops, best of 3: 263 µs per loop)
> > 50 (625 loops, best of 3: 50 µs per loop)
>
> > MUL
> > 3.74
> > 4.340
> > 391 (625 loops, best of 3: 391 µs per loop)
> > 58.2 (625 loops, best of 3: 58.2 µs per loop)
>
> Via the PARI C library:
>
> sage: t=pari('t')
> sage: b = (3*t^2+2*t+1)/(t+10)
> sage: c = (5*t+7)/13
> sage: timeit('a = b+c')
> 625 loops, best of 3: 8.06 µs per loop
> sage: timeit('a=b*c')
> 625 loops, best of 3: 10.3 µs per loop
>
>
>
>
>
>
>
> > In the medium case, we use N = 100,000 repetitions and
>
> > b =
> > (-19836522420*t^29+79346089680*t^28+57706247040*t^27+238038269040*t^26+1586 
> > 92179360*t^25+555422627760*t^24+5119102560*t^23-714114807120*t^22+528973931 
> > 20*t^21+79346089680*t^20-79346089680*t^19-1983652242000*t^16-39673044840*t^ 
> > 14+39673044840*t^13+4959130605*t^12+79346089680*t^11+359032080*t^10-3967304 
> > 4840*t^6-119019134520*t^5+15869217936*t-26448696560)/
> > (158692179360*t^30+7934608968*t^29-793460896800*t^28+13224348280*t^27-10172 
> > 57560*t^25+741552240*t^24+52897393120*t^23-79346089680*t^21-46285218980*t^2 
> > 0+29754783630*t^19+158692179360*t^18-555422627760*t^17-158692179360*t^15-39 
> > 673044840*t^14-7213280880*t^11-39673044840*t^10+79346089680*t^9-44962784152 
> > 0*t^8-79346089680*t^7-5805811440*t^5-79346089680*t^4+79346089680*t^3-264486 
> > 96560*t-634768717440)
> > c =
> > (91560*t^30-183120*t^29+45780*t^27-30520*t^26-6226080*t^24-91560*t^22+36624 
> > 0*t^21+91560*t^20+45780*t^19+91560*t^17+45780*t^16-91560*t^15+137340*t^14-3 
> > 0520*t^13-30520*t^12-45780*t^11+274680*t^10+1007160*t^8-9156*t^7-274680*t^6 
> > -11445*t^4+9240*t^3-412020*t^2-39240*t
> > +366240)/
> > (45780*t^30-91560*t^29-91560*t^28-91560*t^27-45780*t^25+22890*t^24+91560*t^ 
> > 23+91560*t^22+183120*t^21+183120*t^20+10071600*t^19-18312*t^18+549360*t^16- 
> > 45780*t^14+549360*t^13-91560*t^12+274680*t^10-45780*t^9-91560*t^8+549360*t^ 
> > 7-45780*t^6+503580*t^5+91560*t^3-137340*t^2+457800*t
> > +91560)
>
> > ADD
> > 16.4
> > 30.890
> > 25200 (5 loops, best of 3: 252 ms per loop)
> > 27 (625 loops, best of 3: 270 µs per loop)
>
> > MUL
> > 19.43
> > 47.670
> > 3370000 (1 loops, best of 1: 33.7 s per loop)
> > 34.4 (625 loops, best of 3: 344 µs per loop)
>
> Did you try using the Sage PARI C library.  It's reasonably good at
> arithmetic in K(t) for K=Q and F_q.   On my 2.1Ghz laptop I get the
> following on the above example:
>
> t=pari('t')
> a, b as above
> sage: timeit('a = b+c')
> 21.5    (625 loops, best of 3: 215 µs per loop)
> sage: timeit('a = b*c')
> 22.8    (625 loops, best of 3: 228 µs per loop)
>
> Or am I timing the wrong thing?
>
>
>
>
>
>
>
> > In the large case, we use N = 100,000 repetitions and
>
> > b = (t^99-5*t^98+4*t^96+t^95-t^94+4*t^93-8*t^92-310*t^91-
> > t^89+21*t^86-10*t^85+333*t^84-28*t^81+t^79+5*t^78-4*t^77+5*t^76-5*t^75-5*t^ 
> > 74-
> > t^73+t^72-2*t^71+2*t^69+t^68+8*t^67-
> > t^66-2*t^65+6*t^64+t^63+t^62+2*t^61+4*t^60+t^59+6*t^57-2*t^56+t^55-5*t^52-6 
> > 3*t^51-2*t^49-
> > t^48+t^47-t^46+2*t^45+t^44-
> > t^43+5*t^42-2*t^41+15*t^40+t^39-2*t^38+t^37-55*t^36+t^35+t^34-7*t^33-
> > t^31+4*t^30-2*t^29+t^27+2*t^26-3*t^25+t^24-2*t^23+2*t^21+2*t^20+t^19+t^18-
> > t^17-94*t^16+4*t^15+2*t^14-3*t^13-t^12+4*t^11+t^10-t^9+t^7+2*t^6+t^5-
> > t^4)/(2*t^100-
> > t^99-2*t^98+2*t^97+2*t^95+2*t^94+11*t^93+2*t^91+t^90+t^89+t^88-
> > t^87+2*t^86+12*t^85-2*t^83+3*t^81+9*t^80+t^79-9*t^78-
> > t^76+2*t^75+50*t^74-t^73+t^72-2*t^71-2*t^70-
> > t^65+t^64+t^63+3*t^62-2*t^60+t^58-5*t^57-2*t^55-5*t^54+t^53-t^52-t^51-
> > t^50-2*t^49+7*t^48+2*t^46+3*t^44+2*t^43+60*t^41-
> > t^38+t^37+7*t^36+7*t^34-t^32+6*t^30-3*t^28+t^27+29*t^26-t^25-
> > t^24+7*t^23+17*t^21-4*t^20+10*t^19+t^18+t^16-t^15+t^14+t^13-
> > t^12+t^11+t^10+t^9+t^8-t^7+14*t^6-t^5-t^4+t^3-t^2-t
> > +1)
> > c = (-t^137-5*t^136-t^135-t^134-2*t^133+t^131+8*t^130-t^129-
> > t^128-10*t^127+t^125+t^124+9*t^123-t^122-16*t^121+2*t^120-t^119-
> > t^118+2*t^117-2*t^116-t^114-8*t^111+t^110+t^108+t^107-
> > t^106+10*t^104-6*t^103-
> > t^102+t^101+5*t^99-81*t^98+t^97-2*t^96+2*t^95-5*t^94+t^93-2*t^92+5*t^91+t^9 
> > 0-
> > t^89+t^87-7*t^86-9*t^85-6*t^84-t^83+23*t^82-2*t^81-t^80+2*t^78-
> > t^77-2*t^75-3*t^74+t^73+19*t^71-20*t^70+2*t^68-29*t^67-t^66-2*t^65-
> > t^63+133*t^62+t^61+8*t^60+5*t^59-
> > t^58-2*t^57+5*t^56+t^55-2*t^54-9*t^51-4*t^50+14*t^49-t^48-
> > t^47+6*t^46+t^45+5*t^43+3*t^42-9*t^41+t^40-3*t^39+t^38+4*t^37+2*t^36+3*t^35 
> > -2*t^34-
> > t^33-t^31-t^30-3*t^28-t^27+t^25+2*t^24-4*t^23+t^22+t^21-t^20-
> > t^19+3*t^18+4*t^17+t^16-51*t^15-2*t^14-t^13-
> > t^12-2*t^11+3*t^10+t^9-2*t^8+t^7-3*t^6+4*t^4+3*t^3-t^2)/
> > (t^137+t^136-3*t^135-t^134+2*t^133-
> > t^132+20*t^131-3*t^129+t^128-2*t^126-t^125-t^124-2*t^123-t^121+2*t^120-
> > t^119+t^118-77*t^116+t^115-t^113-4*t^110-2*t^109-
> > t^108+t^106+4*t^105-2*t^103+t^102+t^101+10*t^100-23*t^99-
> > t^98-22*t^97-2*t^94-7*t^93+t^91+2*t^90-2*t^89+2*t^87-6*t^86-
> > t^85+t^84-2*t^82-28*t^81-
> > t^79-176*t^78-9*t^77+t^76-6*t^75-3*t^74+t^73-23*t^72-6*t^71+10*t^70+t^69+2* 
> > t^68-2*t^67-4*t^66+37*t^64-
> > t^63+2*t^62-2*t^61+t^60+t^59-2*t^58+67*t^56-10*t^55-8*t^54+25*t^53+2*t^52-5 
> > *t^51+t^50-2*t^49-
> > t^48-t^47-2*t^46-5*t^45-t^43-t^42-
> > t^41+5*t^39+2*t^35-4*t^34+t^33+3*t^32-
> > t^30-3*t^29+t^28+2*t^27+2*t^26+t^25+t^24-t^23+t^22+3*t^21-
> > t^20+t^19-3*t^17+4*t^15-t^14+4*t^13+2*t^11+t^9-t^8+t^7-t^4-t^3-
> > t^2-12*t-17)
>
> > ADD
> > 27.62
> > 180.040
> > 3980000 (1 loops, best of 1: 39.8 s per loop)
> > 36.4 (625 loops, best of 3: 364 µs per loop)
>
> > MUL
> > 35.79
> > 285.680
> > NA (Manually terminated before completion)
> > 57 (625 loops, best of 3: 570 µs per loop)
>
> In PARI:
>
> sage: timeit('a = b+c')
> 125 loops, best of 3: 1.98 ms per loop
> sage: timeit('a = b*c')
> 125 loops, best of 3: 3.11 ms per loop
>
> There FLINT is beating PARI's slower multiplication.
>

I don't follow. Here the timings for PARI are in milliseconds, above
the timings for FLINT are in 10's of microseconds. FLINT is also
beating Pari for the addition, and by a factor of 6 or something. Or
am I looking at the wrong timings.

Bill.
--~--~---------~--~----~------------~-------~--~----~
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel-unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org
-~----------~----~----~----~------~----~------~--~---
[sage-devel] Re: Implementation of Q(t) and monomials

Reply via email to