Here's the test code I was using, modeled on the basic operation that longobject needs: multiply two digits together and extract high and low digits from the result.
typedef uint32_t digit; typedef uint64_t twodigits; #define SHIFT 30 #define MASK (((digit)1 << SHIFT) - 1) /* multiply a and b, and split into high digit (returned) and low digit (put into *low) */ extern digit digit_mul(digit *low, digit a, digit b) { twodigits prod; prod = (twodigits)a * b; *low = (digit)(prod & MASK); return (digit)(prod >> SHIFT); } Using gcc 4.0.1 on 32-bit x86 and compiling with "gcc -O1 -S test.c" gives, for me, a file test.s that looks like: .text .globl _digit_mul _digit_mul: pushl %ebp movl %esp, %ebp pushl %esi subl $4, %esp movl 16(%ebp), %eax mull 12(%ebp) movl %eax, %esi andl $1073741823, %esi movl 8(%ebp), %ecx movl %esi, (%ecx) shrdl $30, %edx, %eax addl $4, %esp popl %esi leave ret .subsections_via_symbols There's only a single mull instruction there, showing that gcc is doing the right thing, at least when optimization is turned on. Without optimization, gcc produces three separate multiplications, two of which are multiplications by zero. But if I compile with the -m64 flag to force 64-bit then the multiply becomes: imulq %rsi, %rdx which looks a lot like a 64 x 64 -> 64 multiply to me. This seems inefficient, when a 32 x 32 -> 64 bit multiply ought to be good enough. But maybe there isn't a significant performance difference on x86_64? Mark _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com