Marco Bodrato <bodr...@mail.dm.unipi.it> writes:

> Well, I added one more move to order the cases as you suggest. The
> code gets a little bit shorter.

Thanks, looks good to me. I think one more instruction is easy to move,
see below.

> I also renamed registers, so that a push/pop couple is needed only if
> the loop is used; this may save a couple of cycles when the size is
> small. Does this make sense?

Makes sense.

> L(end):       mul     %r9
>       add     %rax, %r11
>       adc     %rdx, %r10
>       cmp     $1, R32(n)
>       ja      L(two)
>       jnz     L(nul)
>
>       mov     -8(ap), %rax      <-- 1

I think this instruction and the one marked "2" below can be moved to
the start of the L(ona): part, just before the mul %r8 ("3" below).
Slightly worse scheduling, though.

>       mov     %r11, -16(rp)
>       mov     %r10, %r11
>       jmp     L(one)

I had hoped this jump and preceding instructions could be eliminated, to
get a structure like

        ja      L(two)
        jz      L(one)

L(nul): (no jumps to this label left)
        ...
        fall through 
L(one):
        ...
        fall through
L(two): 
        ...
        function exit

But might need other move instructions, to get the right data into the
right registers?

> L(nul):       mov     -16(ap), %rax
>       mov     %r11, -24(rp)
>       mul     %r8
>       add     %rax, %r10
>       mov     -16(bp), %rax
>       mov     $0, R32(%r11)
>       adc     %rdx, %r11
>       mul     %r9
>       add     %rax, %r10
>       mov     -8(ap), %rax      <-- 2
>       adc     %rdx, %r11
>       mov     %r10, -16(rp)
> L(one):       mul     %r8       <-- 3
>       add     %rax, %r11
>       mov     -8(bp), %rax
>       mov     $0, R32(%r10)
>       adc     %rdx, %r10
>       mul     %r9
>       add     %rax, %r11
>       adc     %rdx, %r10
>
> L(two):       mov     %r11, -8(rp)
>       mov     %r10, %rax
> L(ret):       pop     %rbp
>       FUNC_EXIT()
>       ret
> EPILOGUE()

So I think your version is an improvement as is, and perhaps not worth
the effort to try to eliminate a few more instructions if this rather
obscure function.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
_______________________________________________
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Reply via email to