2009/9/10 Bill Hart <goodwillh...@googlemail.com>:
>
> This program only takes 0.68s in C using a pretty naive mpz program on
> sage.math. I doubt the memory allocation is really relevant. The
> interpreter overhead is by far the greatest component
>
> Bill.
>
> On 9 Sep, 17:57, Nils Bruin <nbr...@sfu.ca> wrote:
>> Inspired by a little experiment we did to see if there is room to
>> improve to ECL's bignum performance, we ran a little experiment
>> computing fibonacci numbers (we wanted to test addition because it was
>> mainly ECL's memory management that was under consideration)
>>
>> with the following defs:
>>
>> def fibo(n):
>>   a=1
>>   b=1
>>   c=0
>>   for i in range(n):
>>     c=a
>>     a=a+b
>>     b=c
>>   return b
>>
>> def test(n,m):
>>   for i in range(m):
>>     _=fibo(n)

  This is an interesting test. I tested it in the language I
developing, and noticed it doesn't handle this special case very well,
besides simple, it is subtle in how it tests a language memory
management.

  Describing how it works in my language:
    c = a
(1)  if 'c' is a t_mpz, it calls mpz_set(c, a), else, it tags the
object pointed by 'a' as type t_shz (shared), and make 'c' also point
to it
    a = a + b
(2) if 'a' is of type t_mpz, it directly stores the result in 'a',
else if 'a' if a shared mpz_t (t_shz), it allocates a new mpz_t and
makes 'a' point to it (the result of the addition is always computed
in a per thread "accumulator")
    b = c
(3) same logic as 1.

  The interesting fact is that by using shared objects, in this case
it ends up always not changing 'a' in place, because when it's value
was assigned to 'c', it was retyped to t_shz (shared mpz_t). In other
words, it allocates one mpz_t per loop iteration.

  I made a simple build, where, if the type of an object is not t_mpz,
when assigned an mpz_t object, it would not make it shared, but always
make a copy, and in that case, it almost triplied the speed in my
interpreter. In other words, it used only 3 mpz_t objects per call to
"fibo".

  But the C version (somewhat simulating what a close to perfect high
level language would optimize this example to) is not that faster then
Python. Just for completeness, here are the tests I used (my computer
is not that fast, so I used 400 iterations for fibo 4000):

- C version -
#include <stdlib.h>
#include <gmp.h>

mpz_t *fibo(int n) {
    mpz_t *a, *b, *c;
    a = malloc(sizeof(mpz_t));
    b = malloc(sizeof(mpz_t));
    c = malloc(sizeof(mpz_t));
    mpz_init_set_ui(*a, 1);
    mpz_init_set_ui(*b, 1);
    mpz_init(*c);
    for (; n > 0; --n) {
        mpz_set(*c, *a);
        mpz_add(*a, *a, *b);
        mpz_set(*b, *c);
    }
    mpz_clear(*a);
    mpz_clear(*c);
    free(a);
    free(c);
    return b;
}

void test(int n, int m) {
    mpz_t *_;
    for (; m > 0; --m) {
        _ = fibo(n);
        mpz_clear(*_);
        free(_);
    }
}

int main(int argc, char *argv[])
{
    test(4000, 400);

    return 0;
}
-%<-

- Python version -
def fibo(n):
    a = 1
    b = 1
    c = 0
    for i in range(n):
        c = a
        a = a + b
        b = c
    return b

def test(n, m):
    for i in range(m):
        _ = fibo(n)

test(4000, 400)
-%<-

- Lisp version -
(defun fibo (n)
  (let ((a 1) (b 1) (c 0))
    (dotimes (i n)
      (setf c a)
      (incf a b)
      (setf b c))
    (return-from fibo b)))
(compile 'fibo)

(defun test (n m)
  (let (_)
    (dotimes (i m)
      (setf _ (fibo n)))))
(compile 'test)

(test 4000 400)
-%<-

- My Language (TM) version :-)
auto fibo(n) {
    auto a = 1, b = 1, c;
    for (; n > 0; --n) {
        c = a;
        a = a + b;
        b = c;
    }
    return b;
}

void test(n, m) {
    auto _;
    for (; m > 0; --m)
        _ = fibo(n);
}

test(4000, 400);
-%<-

One random run in my interpreter:
2.58user 0.04system 0:02.94elapsed
One random run in ecl:
3.69user 0.20system 0:05.09elapsed
One random run in clisp:
2.31user 0.78system 0:03.42elapsed
One random run in sbcl:
0.92user 0.05system 0:02.69elapsed
One random run in python:
1.08user 0.01system 0:01.38elapsed
One random run of the C binary:
0.54user 0.00system 0:00.60elapsed

  This is an interesting case, where not attempting to share the mpz_t
is better, and coupled with inplace store of arithmetic it becomes
quite good.

  It would be interesting some other kinds of tests also, for example,
while Python was pretty good in this special case, I have seen cases
where it is more then 10 times slower then the language I am
developing, like in a simple factorial, recursive or not, but I
believe it is because of Python using its own bignum library, that
doesn't have optimized multiplication.

  I used all iterations with mpz_t objects in the C version for
simplification, as it starts using them at around 1000 iterations,
otherwise, would need to simulate a high level language, that would
check for overflow before switching to use mpz_t's, etc. But if
unrolled to start using mpz_t objects without checking for overflow,
could reduce the time by up to 20%.

--~--~---------~--~----~------------~-------~--~----~
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel-unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~----------~----~----~----~------~----~------~--~---

Reply via email to