"se...@t-online.de" <se...@t-online.de> writes: >>I disabled the asm mult16x16 to hunt the bug and with the generic >> version it run well. So the problem had to be the asm. >>Here I compared the generated asm: http://franke.ms/cex/z/oG53bK and >> you see that only one register is used instead of two, since the >> modification is not recognized. >> >>/* here --> */ "=d" (__umul_tmp1), >> >>to >> >>/* here --> */ "=&d" (__umul_tmp1), >> >>does the magic.
To be clear, the meaning of this &, according to the docs, is to tell gcc that the "output" register assigned to __umul_tmp1 can't overlap the inputs. If I read it correctly, __umul_tmp1 is %2 in the asm template, and the b input is %5. I've forgotten most I knew about 68k assembly, but it looks to me like %5 is used twice, and %2 is used in between, which could be a problem if they're assigned the same register. But not sure how that would interact with "%2" ((USItype)(a)), which if I get it right forces this input to be allocated in the same register as __umul_tmp1 output. The sqr_basecase function uses a couple of umul_ppmm(rp[11], lpl, ul, ul), so my best guess is that we get all *three* of a, b, __umul_tmp1 allocated in the same register. If you could show the generated code (after gcc's register allocation) *and* point out precisely where things go wrong, that would be helpful. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. _______________________________________________ gmp-bugs mailing list gmp-bugs@gmplib.org https://gmplib.org/mailman/listinfo/gmp-bugs