Dear William, I am adding a somewhat more detailed performance report below, comparing my own FLINT-based C code, MAGMA, SAGE 4.1.2.alpha2 and SAGE 4.1.2.alpha2 with the patch from trac ticket #4000.
To give an impression of the results, for very small examples MAGMA marginally beats my C implementation, but I am very confident that I could change this around using a lazy evaluation approach for changing values away from 0/1 (then to be implemented as NULL/NULL). At this stage already, the regular (resp. patched) SAGE code is out by a factor of 100 (resp. 10). In the last example with degrees 100 and 140, the C code beats MAGMA by a factor of close to 10 and the regular SAGE code is a factor of around 150,000 slower (for addition; the I did not wait for the multiplication to finish), that is to say, essentially unusable for repeated calculations. Fortunately, at this stage, the patched SAGE code is less than a factor of 2 slower than the C code and clearly beats the MAGMA code. By the way, I wouldn't want to say that the C code is heavily optimised at the moment. When writing it, I have focussed on writing a clean interface while adding as little overhead as possible, thereby taking advantage of the speed that FLINT provides. Nonetheless, apart from using the lazy evaluation approach mentioned above and trying to improve the combined operation addmul (used for matrix operations), I do not have any ideas for improvement at the moment. But of course, I'd be happy to incorporate any suggestions. As a final comment, I want to add that the performance of the patched SAGE code shouldn't be a surprise. As part of the work on trac ticket #4000, I had already written a class "UnivariateRationalRationalFunction(FractionFieldElement)" to represent a rational function over the rationals, based on Z[t], which in turn is based on FLINT already. It is thus to be expected that it should run at a similar speed apart from a small constant factor. Kind regards, Sebastian [Performance comparison] Performance comparison between FLINT-based C code, MAGMA (Magma V2.15-12), SAGE (4.1.2.alpha2), SAGE (4.1.2.alpha2 with changes along trac ticket #4000). MAGMA was run on the departmental machine, with eight CPUs with the following specifications model name : Dual-Core AMD Opteron(tm) Processor 8220 cpu MHz : 1000.000 cache size : 1024 KB and 32GB of memory. The other three programs were run on my laptop, with two CPUs with the following specifications model name : Intel(R) Core(TM)2 Duo CPU P8400 @ 2.26GHz cpu MHz : 800.000 cache size : 3072 KB and 2GB of memory. We only compare addition and multiplication, based on three examples of different sizes. The number of repetitions only applies to the FLINT-based C code and MAGMA. We give the output of "timeit" for SAGE and scale the number up by the number of repetitions to make it easily comparable. The C timings are done using setup in the file qfmpz_poly-testp. The MAGMA code is executed as follows: F := FieldOfFractions(PolynomialRing(Rationals())); t := F.1; b := (3*t^2+2*t+1)/(t+10); c := (5*t+7)/13; t0 := Cputime(); for i:=1 to 1000000 do for> a := b+c; for> end for; t1 := Cputime(t0); t1; The SAGE code is time with F.<t> = Frac(PolynomialRing(QQ, 't')) b = (3*t^2+2*t+1)/(t+10) c = (5*t+7)/13 timeit('a = b+c') In each case, we give the timings in the same order as the name of the programs at the top of this text. In the small case, we use N = 1,000,000 repetitions and b = (3*t^2+2*t+1)/(t+10) c = (5*t+7)/13 ADD 3.41 3.010 263 (625 loops, best of 3: 263 µs per loop) 50 (625 loops, best of 3: 50 µs per loop) MUL 3.74 4.340 391 (625 loops, best of 3: 391 µs per loop) 58.2 (625 loops, best of 3: 58.2 µs per loop) In the medium case, we use N = 100,000 repetitions and b = (-19836522420*t^29+79346089680*t^28+57706247040*t^27+238038269040*t^26+158692179360*t^25+555422627760*t^24+5119102560*t^23-714114807120*t^22+52897393120*t^21+79346089680*t^20-79346089680*t^19-1983652242000*t^16-39673044840*t^14+39673044840*t^13+4959130605*t^12+79346089680*t^11+359032080*t^10-39673044840*t^6-119019134520*t^5+15869217936*t-26448696560)/ (158692179360*t^30+7934608968*t^29-793460896800*t^28+13224348280*t^27-1017257560*t^25+741552240*t^24+52897393120*t^23-79346089680*t^21-46285218980*t^20+29754783630*t^19+158692179360*t^18-555422627760*t^17-158692179360*t^15-39673044840*t^14-7213280880*t^11-39673044840*t^10+79346089680*t^9-449627841520*t^8-79346089680*t^7-5805811440*t^5-79346089680*t^4+79346089680*t^3-26448696560*t-634768717440) c = (91560*t^30-183120*t^29+45780*t^27-30520*t^26-6226080*t^24-91560*t^22+366240*t^21+91560*t^20+45780*t^19+91560*t^17+45780*t^16-91560*t^15+137340*t^14-30520*t^13-30520*t^12-45780*t^11+274680*t^10+1007160*t^8-9156*t^7-274680*t^6-11445*t^4+9240*t^3-412020*t^2-39240*t +366240)/ (45780*t^30-91560*t^29-91560*t^28-91560*t^27-45780*t^25+22890*t^24+91560*t^23+91560*t^22+183120*t^21+183120*t^20+10071600*t^19-18312*t^18+549360*t^16-45780*t^14+549360*t^13-91560*t^12+274680*t^10-45780*t^9-91560*t^8+549360*t^7-45780*t^6+503580*t^5+91560*t^3-137340*t^2+457800*t +91560) ADD 16.4 30.890 25200 (5 loops, best of 3: 252 ms per loop) 27 (625 loops, best of 3: 270 µs per loop) MUL 19.43 47.670 3370000 (1 loops, best of 1: 33.7 s per loop) 34.4 (625 loops, best of 3: 344 µs per loop) In the large case, we use N = 100,000 repetitions and b = (t^99-5*t^98+4*t^96+t^95-t^94+4*t^93-8*t^92-310*t^91- t^89+21*t^86-10*t^85+333*t^84-28*t^81+t^79+5*t^78-4*t^77+5*t^76-5*t^75-5*t^74- t^73+t^72-2*t^71+2*t^69+t^68+8*t^67- t^66-2*t^65+6*t^64+t^63+t^62+2*t^61+4*t^60+t^59+6*t^57-2*t^56+t^55-5*t^52-63*t^51-2*t^49- t^48+t^47-t^46+2*t^45+t^44- t^43+5*t^42-2*t^41+15*t^40+t^39-2*t^38+t^37-55*t^36+t^35+t^34-7*t^33- t^31+4*t^30-2*t^29+t^27+2*t^26-3*t^25+t^24-2*t^23+2*t^21+2*t^20+t^19+t^18- t^17-94*t^16+4*t^15+2*t^14-3*t^13-t^12+4*t^11+t^10-t^9+t^7+2*t^6+t^5- t^4)/(2*t^100- t^99-2*t^98+2*t^97+2*t^95+2*t^94+11*t^93+2*t^91+t^90+t^89+t^88- t^87+2*t^86+12*t^85-2*t^83+3*t^81+9*t^80+t^79-9*t^78- t^76+2*t^75+50*t^74-t^73+t^72-2*t^71-2*t^70- t^65+t^64+t^63+3*t^62-2*t^60+t^58-5*t^57-2*t^55-5*t^54+t^53-t^52-t^51- t^50-2*t^49+7*t^48+2*t^46+3*t^44+2*t^43+60*t^41- t^38+t^37+7*t^36+7*t^34-t^32+6*t^30-3*t^28+t^27+29*t^26-t^25- t^24+7*t^23+17*t^21-4*t^20+10*t^19+t^18+t^16-t^15+t^14+t^13- t^12+t^11+t^10+t^9+t^8-t^7+14*t^6-t^5-t^4+t^3-t^2-t +1) c = (-t^137-5*t^136-t^135-t^134-2*t^133+t^131+8*t^130-t^129- t^128-10*t^127+t^125+t^124+9*t^123-t^122-16*t^121+2*t^120-t^119- t^118+2*t^117-2*t^116-t^114-8*t^111+t^110+t^108+t^107- t^106+10*t^104-6*t^103- t^102+t^101+5*t^99-81*t^98+t^97-2*t^96+2*t^95-5*t^94+t^93-2*t^92+5*t^91+t^90- t^89+t^87-7*t^86-9*t^85-6*t^84-t^83+23*t^82-2*t^81-t^80+2*t^78- t^77-2*t^75-3*t^74+t^73+19*t^71-20*t^70+2*t^68-29*t^67-t^66-2*t^65- t^63+133*t^62+t^61+8*t^60+5*t^59- t^58-2*t^57+5*t^56+t^55-2*t^54-9*t^51-4*t^50+14*t^49-t^48- t^47+6*t^46+t^45+5*t^43+3*t^42-9*t^41+t^40-3*t^39+t^38+4*t^37+2*t^36+3*t^35-2*t^34- t^33-t^31-t^30-3*t^28-t^27+t^25+2*t^24-4*t^23+t^22+t^21-t^20- t^19+3*t^18+4*t^17+t^16-51*t^15-2*t^14-t^13- t^12-2*t^11+3*t^10+t^9-2*t^8+t^7-3*t^6+4*t^4+3*t^3-t^2)/ (t^137+t^136-3*t^135-t^134+2*t^133- t^132+20*t^131-3*t^129+t^128-2*t^126-t^125-t^124-2*t^123-t^121+2*t^120- t^119+t^118-77*t^116+t^115-t^113-4*t^110-2*t^109- t^108+t^106+4*t^105-2*t^103+t^102+t^101+10*t^100-23*t^99- t^98-22*t^97-2*t^94-7*t^93+t^91+2*t^90-2*t^89+2*t^87-6*t^86- t^85+t^84-2*t^82-28*t^81- t^79-176*t^78-9*t^77+t^76-6*t^75-3*t^74+t^73-23*t^72-6*t^71+10*t^70+t^69+2*t^68-2*t^67-4*t^66+37*t^64- t^63+2*t^62-2*t^61+t^60+t^59-2*t^58+67*t^56-10*t^55-8*t^54+25*t^53+2*t^52-5*t^51+t^50-2*t^49- t^48-t^47-2*t^46-5*t^45-t^43-t^42- t^41+5*t^39+2*t^35-4*t^34+t^33+3*t^32- t^30-3*t^29+t^28+2*t^27+2*t^26+t^25+t^24-t^23+t^22+3*t^21- t^20+t^19-3*t^17+4*t^15-t^14+4*t^13+2*t^11+t^9-t^8+t^7-t^4-t^3- t^2-12*t-17) ADD 27.62 180.040 3980000 (1 loops, best of 1: 39.8 s per loop) 36.4 (625 loops, best of 3: 364 µs per loop) MUL 35.79 285.680 NA (Manually terminated before completion) 57 (625 loops, best of 3: 570 µs per loop) [/Performance comparison] On Nov 15, 2:05 am, William Stein <wst...@gmail.com> wrote: > On Sat, Nov 14, 2009 at 5:59 PM, Sebastian Pancratz <s...@pancratz.org> wrote: > > > Dear all, > > > Some might remember that in September I put a lot of effort into > > rewriting Q[t] to use FLINT. While the patch (see trac #4000) was > > very usable in practice already, despite the help of many people there > > remained a few doctest failures throughout SAGE. > > Could you add something to this email about performance comparisons with > Magma? > It's one thing to say "Sage is slow compared to my custom optimized C > code" (no surprise), and another to say "Sage is 1000 times slower > than some easy-to-write interpreter code in Magma" (ouch, and is the > case here, right?). > > William > > > > > > > In short, while using SAGE with my own PhD work (on computing > > connection matrices) I came across a couple other issues with a high > > overhead involved in using SAGE and so I decided to write plain C. > > Currently, I am occupied with preparing my first year transfer thesis, > > which I think serves the same purpose as the qualifying exams in the > > US, but I am hoping that afterwards I can find the time to finish the > > above ticket, if no-one else closes it before that. In the meantime, > > I wanted to briefly describe the problems I ran into using SAGE and > > let people know about some of the C code I have written to overcome > > these problems (at least with regards to my own work), in case someone > > else is interested. > > > The first problem concerns the rational function field over Q. Using > > this via > > > F.<t> = Frac(PolynomialRing(QQ, 't')) > > > was slow without applying the patch at trac #4000, but even with that > > applied there seemed to still be a very noticeable overhead due to > > some category related code. In fairness, I have to admit that I did > > not go through too much trouble to chase down the exact problem. But > > in any case I have written *much* faster C code based on FLINT --- I > > include a link to this at the very end of this post. I should also > > point to my post on flint-devel > > > http://groups.google.com/group/flint-devel/browse_thread/thread/43a57... > > > I think that this code is set up quite well and that other people > > might be interested in this. > > > The second problem had to do with monomials in multivariate polynomial > > rings. Here perhaps I should first say that I have only been working > > with three to five variables, that is to say, there is no notion of > > asymptotic performance as the number of variables increases. My main > > motivation to look at this was that, on some examples, my code for > > computing connection matrices made 5--100 million calls to monomial > > comparisons alone. In SAGE, there was a very noticeable improvement > > in performance at each step when moving from using > > MPolynomial_polydict objects in a multivariate polynomial ring to > > working directly with ETuple objects to finally working with a Cython > > wrapper of a C array of ints. In my C code, I have now taken this a > > step further, representing an entire monomial as a primitive type in a > > 64-bit word. Benefits are the greatly simplified data structure and > > that a couple of operations (multiplication, division, inverse > > lexicographical comparison, to name the most prominent ones) reduce to > > a simple operation on single words ("+", "-" and "<", respectively). > > There are a few downsides, in particular with regards to generality. > > For example, if one wants to use up to eight variables and have a > > fixed number of bits per exponents regardless of the number of > > variables (so that one can have implicit coercion between $R[X_0, X_1] > > \subset \dotsb \subset R[X_0, \dotsc, X_7]$), then one needs to > > guarantee that no computation involves individual exponents greater > > than 255. On the other hand, if one is working with curves in three > > homogeneous variables, then this approach allows exponents up to > > 65535, which will probably suffice for most practical purposes. > > > Of course, this implementation is much faster than my previous SAGE > > code. But it is much more limited and I don't quite see it being of > > much use in a general framework. Nonetheless, in case someone is > > interested in taking a look at this, I have uploaded it to my domain, > > too. > > > I've uploaded these two implementations, together with full online > > code documentations generated by doxygen, to my domain at > > > http://www.pancratz.org/code/ > > > Please don't be surprised by the simplistic design of the website, so > > far I am mostly using it for private purposes. > > > Kind regards, > > > Sebastian > > -- > William Stein > Associate Professor of Mathematics > University of Washingtonhttp://wstein.org --~--~---------~--~----~------------~-------~--~----~ To post to this group, send an email to sage-devel@googlegroups.com To unsubscribe from this group, send an email to sage-devel-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/sage-devel URL: http://www.sagemath.org -~----------~----~----~----~------~----~------~--~---