2009/9/10 Bill Hart <goodwillh...@googlemail.com>: > > This program only takes 0.68s in C using a pretty naive mpz program on > sage.math. I doubt the memory allocation is really relevant. The > interpreter overhead is by far the greatest component > > Bill. > > On 9 Sep, 17:57, Nils Bruin <nbr...@sfu.ca> wrote: >> Inspired by a little experiment we did to see if there is room to >> improve to ECL's bignum performance, we ran a little experiment >> computing fibonacci numbers (we wanted to test addition because it was >> mainly ECL's memory management that was under consideration) >> >> with the following defs: >> >> def fibo(n): >> a=1 >> b=1 >> c=0 >> for i in range(n): >> c=a >> a=a+b >> b=c >> return b >> >> def test(n,m): >> for i in range(m): >> _=fibo(n)
This is an interesting test. I tested it in the language I developing, and noticed it doesn't handle this special case very well, besides simple, it is subtle in how it tests a language memory management. Describing how it works in my language: c = a (1) if 'c' is a t_mpz, it calls mpz_set(c, a), else, it tags the object pointed by 'a' as type t_shz (shared), and make 'c' also point to it a = a + b (2) if 'a' is of type t_mpz, it directly stores the result in 'a', else if 'a' if a shared mpz_t (t_shz), it allocates a new mpz_t and makes 'a' point to it (the result of the addition is always computed in a per thread "accumulator") b = c (3) same logic as 1. The interesting fact is that by using shared objects, in this case it ends up always not changing 'a' in place, because when it's value was assigned to 'c', it was retyped to t_shz (shared mpz_t). In other words, it allocates one mpz_t per loop iteration. I made a simple build, where, if the type of an object is not t_mpz, when assigned an mpz_t object, it would not make it shared, but always make a copy, and in that case, it almost triplied the speed in my interpreter. In other words, it used only 3 mpz_t objects per call to "fibo". But the C version (somewhat simulating what a close to perfect high level language would optimize this example to) is not that faster then Python. Just for completeness, here are the tests I used (my computer is not that fast, so I used 400 iterations for fibo 4000): - C version - #include <stdlib.h> #include <gmp.h> mpz_t *fibo(int n) { mpz_t *a, *b, *c; a = malloc(sizeof(mpz_t)); b = malloc(sizeof(mpz_t)); c = malloc(sizeof(mpz_t)); mpz_init_set_ui(*a, 1); mpz_init_set_ui(*b, 1); mpz_init(*c); for (; n > 0; --n) { mpz_set(*c, *a); mpz_add(*a, *a, *b); mpz_set(*b, *c); } mpz_clear(*a); mpz_clear(*c); free(a); free(c); return b; } void test(int n, int m) { mpz_t *_; for (; m > 0; --m) { _ = fibo(n); mpz_clear(*_); free(_); } } int main(int argc, char *argv[]) { test(4000, 400); return 0; } -%<- - Python version - def fibo(n): a = 1 b = 1 c = 0 for i in range(n): c = a a = a + b b = c return b def test(n, m): for i in range(m): _ = fibo(n) test(4000, 400) -%<- - Lisp version - (defun fibo (n) (let ((a 1) (b 1) (c 0)) (dotimes (i n) (setf c a) (incf a b) (setf b c)) (return-from fibo b))) (compile 'fibo) (defun test (n m) (let (_) (dotimes (i m) (setf _ (fibo n))))) (compile 'test) (test 4000 400) -%<- - My Language (TM) version :-) auto fibo(n) { auto a = 1, b = 1, c; for (; n > 0; --n) { c = a; a = a + b; b = c; } return b; } void test(n, m) { auto _; for (; m > 0; --m) _ = fibo(n); } test(4000, 400); -%<- One random run in my interpreter: 2.58user 0.04system 0:02.94elapsed One random run in ecl: 3.69user 0.20system 0:05.09elapsed One random run in clisp: 2.31user 0.78system 0:03.42elapsed One random run in sbcl: 0.92user 0.05system 0:02.69elapsed One random run in python: 1.08user 0.01system 0:01.38elapsed One random run of the C binary: 0.54user 0.00system 0:00.60elapsed This is an interesting case, where not attempting to share the mpz_t is better, and coupled with inplace store of arithmetic it becomes quite good. It would be interesting some other kinds of tests also, for example, while Python was pretty good in this special case, I have seen cases where it is more then 10 times slower then the language I am developing, like in a simple factorial, recursive or not, but I believe it is because of Python using its own bignum library, that doesn't have optimized multiplication. I used all iterations with mpz_t objects in the C version for simplification, as it starts using them at around 1000 iterations, otherwise, would need to simulate a high level language, that would check for overflow before switching to use mpz_t's, etc. But if unrolled to start using mpz_t objects without checking for overflow, could reduce the time by up to 20%. --~--~---------~--~----~------------~-------~--~----~ To post to this group, send an email to sage-devel@googlegroups.com To unsubscribe from this group, send an email to sage-devel-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/sage-devel URLs: http://www.sagemath.org -~----------~----~----~----~------~----~------~--~---