"se...@t-online.de" writes:
> There you see the original code on the left and the generated assembly in the
> middle.
> You can add the & to
> "=d" (__umul_tmp1)
> in the left window and you see immediately the change in the comiled output.
Ok, let me see if I understand the problem. The inline asm for umul_ppmm
is as follows:
#define umul_ppmm(xh, xl, a, b) \
do { USItype __umul_tmp1, __umul_tmp2;\
__asm__ ("| Inlined umul_ppmm\n"\
" move%.l %5,%3\n"\
" move%.l %2,%0\n"\
" move%.w %3,%1\n"\
" swap%3\n" \
" swap%0\n" \
" mulu%.w %2,%1\n"\
" mulu%.w %3,%0\n"\
" mulu%.w %2,%3\n"\
" swap%2\n" \
" mulu%.w %5,%2\n"\
" add%.l %3,%2\n"\
" jcc 1f\n" \
" add%.l %#0x1,%0\n" \
"1: move%.l %2,%3\n"\
" clr%.w %2\n" \
" swap%2\n" \
" swap%3\n" \
" clr%.w %3\n" \
" add%.l %3,%1\n"\
" addx%.l %2,%0\n"\
" | End inlined umul_ppmm"\
: "=&d" (xh), "=&d" (xl), \
"=d" (__umul_tmp1), "=&d" (__umul_tmp2) \
: "%2" ((USItype)(a)), "d" ((USItype)(b))); \
} while (0)
There are two instructions mentioning the input %5,
move%.l %5,%3 (first instruction)
mulu%.w %5,%2 (close to the middle)
The %2 register is both an input and output, and referenced in a few
places in between. My m68k assembly knowledge is very rusty, most of
them may be harmless reads, but the "swap %2" instruction just before
the use of %5 looks like it relies on %2 and %5 being distinct
registers? In the linked-to compiler output (listed as gcc-6.5.0b, using
-O1), the template is instantiated as
| Inlined umul_ppmm
move.l d5,d2
move.l d5,d0
move.w d2,d1
swapd2
swapd0
mulu.w d5,d1
mulu.w d2,d0
mulu.w d5,d2
swapd5
mulu.w d5,d5
add.l d2,d5
jcc 1f
add.l #0x1,d0
clr.w d5
swapd5
swapd2
clr.w d2
add.l d2,d1
addx.l d5,d0
| End inlined umul_ppmm
so it's quite clear that both %2 and %5 are mapped to the same register,
d5. Adding the suggested &, marking %2 as an "early clobber" output,
changes the instantiation to
| Inlined umul_ppmm
move.l d5,d2
move.l d6,d0
move.w d2,d1
swapd2
swapd0
mulu.w d6,d1
mulu.w d2,d0
mulu.w d6,d2
swapd6
mulu.w d5,d6
add.l d2,d6
jcc 1f
add.l #0x1,d0
clr.w d6
swapd6
swapd2
clr.w d2
add.l d2,d1
addx.l d6,d0
| End inlined umul_ppmm
I.e., %2 gets its own register, d6. If my analysis is right, the
critical difference is
swapd5
mulu.w d5,d5
vs
swapd6
mulu.w d5,d6
I can't say for sure if using the same register for %2 and %5 is
expected behavior of gcc. But to me it seems reasonable of gcc to try to
share a register when an inline asm template is instantiated with
identical expressions for two of the inputs (as in the squaring
umul_ppmm(xh, xl, u, u). And the additional & seems to be the documented
way to tell it to not do that for this asm template.
Regards,
/Niels
--
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
___
gmp-bugs mailing list
gmp-bugs@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-bugs