Marco Bodrato <bodr...@mail.dm.unipi.it> writes: > Well, I added one more move to order the cases as you suggest. The > code gets a little bit shorter.
Thanks, looks good to me. I think one more instruction is easy to move, see below. > I also renamed registers, so that a push/pop couple is needed only if > the loop is used; this may save a couple of cycles when the size is > small. Does this make sense? Makes sense. > L(end): mul %r9 > add %rax, %r11 > adc %rdx, %r10 > cmp $1, R32(n) > ja L(two) > jnz L(nul) > > mov -8(ap), %rax <-- 1 I think this instruction and the one marked "2" below can be moved to the start of the L(ona): part, just before the mul %r8 ("3" below). Slightly worse scheduling, though. > mov %r11, -16(rp) > mov %r10, %r11 > jmp L(one) I had hoped this jump and preceding instructions could be eliminated, to get a structure like ja L(two) jz L(one) L(nul): (no jumps to this label left) ... fall through L(one): ... fall through L(two): ... function exit But might need other move instructions, to get the right data into the right registers? > L(nul): mov -16(ap), %rax > mov %r11, -24(rp) > mul %r8 > add %rax, %r10 > mov -16(bp), %rax > mov $0, R32(%r11) > adc %rdx, %r11 > mul %r9 > add %rax, %r10 > mov -8(ap), %rax <-- 2 > adc %rdx, %r11 > mov %r10, -16(rp) > L(one): mul %r8 <-- 3 > add %rax, %r11 > mov -8(bp), %rax > mov $0, R32(%r10) > adc %rdx, %r10 > mul %r9 > add %rax, %r11 > adc %rdx, %r10 > > L(two): mov %r11, -8(rp) > mov %r10, %rax > L(ret): pop %rbp > FUNC_EXIT() > ret > EPILOGUE() So I think your version is an improvement as is, and perhaps not worth the effort to try to eliminate a few more instructions if this rather obscure function. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel