Here's the test code I was using, modeled on the basic operation
that longobject needs:  multiply two digits together and extract
high and low digits from the result.


typedef uint32_t digit;
typedef uint64_t twodigits;
#define SHIFT 30
#define MASK (((digit)1 << SHIFT) - 1)

/* multiply a and b, and split into high digit (returned)
   and low digit (put into *low) */

extern digit
digit_mul(digit *low, digit a, digit b) {
    twodigits prod;
    prod = (twodigits)a * b;
    *low = (digit)(prod & MASK);
    return (digit)(prod >> SHIFT);
}

Using gcc 4.0.1 on 32-bit x86 and compiling with "gcc -O1 -S test.c"
gives, for me, a file test.s that looks like:

        .text
.globl _digit_mul
_digit_mul:
        pushl   %ebp
        movl    %esp, %ebp
        pushl   %esi
        subl    $4, %esp
        movl    16(%ebp), %eax
        mull    12(%ebp)
        movl    %eax, %esi
        andl    $1073741823, %esi
        movl    8(%ebp), %ecx
        movl    %esi, (%ecx)
        shrdl   $30, %edx, %eax
        addl    $4, %esp
        popl    %esi
        leave
        ret
        .subsections_via_symbols

There's only a single mull instruction there, showing that gcc
is doing the right thing, at least when optimization is turned on.
Without optimization, gcc produces three separate
multiplications, two of which are multiplications
by zero.

But if I compile with the -m64 flag to force 64-bit then the
multiply becomes:

        imulq   %rsi, %rdx

which looks a lot like a 64 x 64 -> 64 multiply to me.  This
seems inefficient, when a 32 x 32 -> 64 bit multiply ought
to be good enough.  But maybe there isn't a significant
performance difference on x86_64?

Mark
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to