Will Partain <[EMAIL PROTECTED]> writes:

> [EMAIL PROTECTED] writes:
> 
> > ... The question is of performance for Int sized things in
> > Integer, so the fact that you call a good library isn't
> > relevant; what's important is what you do when you don't
> > _need_ to use GMP to get the answer.
> 
> It is relevant, because (unless my memory has faded badly...)
> the GMP folks optimize (you know, um, hand-tuned assembly
> language... :-) for the common case, which is Int-sized
> things in Integers.

I've just spent a fair bit of time looking through this code, and I
can't see any such optimization.

To add two Int-size things in an Integer (for simplicity, I'll say two
positive Int-size numbers whose result is also Int-size), ghc goes
through the following steps:

The ghc-generated code calls the GMP function mpz_add().
mpz_add():
        Checks the length (in words) of the two arguments.
        Checks the sign of the two arguments.
        Makes sure that the destination has enough space to hold two 
                words (in case of a possible carry), possibly 
                reallocating.
        Since the two arguments are the same length, and both
                positive, it calls mpn_add().
        Writes the carry (0) into the destination.
        Writes the length (in words) (1) into the destination.
mpn_add() (BTW, mpn_add() is an inline function, so you don't get
                the overhead of a real function call here):
        Calls mpn_add_n().
        Since both source arguments were the same length, it just
                returns without special carry handling.
mpn_add_n() (here's the hand-written, optimized assembly; I examined
                the x86 version):
        This wants to add two 1-word numbers in a fast, unrolled loop.
        It sets up various registers (size, and pointers to
                source1, source2, and destination).
        It computes the number of complete times to go through the
                loop (0).
        It computes the location to jump into the middle of the loop,
                to handle the one word.
        It adjusts the various pointer registers to compensate for the
                fact that it's jumping into the middle of the loop.
        It jumps into the middle of the loop.
        It does the addition.
        It increments the pointer registers.
        It notices that it's time to exit the loop, and exits.
        It computes the carry (0).
        It returns.

I am totally astonished that the Integer-vs-Int speed penalty is only
5x; I believe that (in the particular case of addition, at least) the
current slowdown could be reduced by a factor of at least 10 (so that
instead of taking 1x for the base operations and 4x for the Integer
operations, it would take 1x for the base operations and 0.4x for the
Integer operations, for a total of 1.4x).  (This is a wild guess, of
course, especially since I haven't looked at any operations other than
addition.)

> Idle speculation, yes...

Yes, idle speculation...

Carl Witty
[EMAIL PROTECTED]


Reply via email to