I tried some timing of call to a pc loading thunk versus an rdpc
instruction.  Approximate cycle counts:

       rdpc    thunk
US2      5       2
US3      6       6
T1       6      10

I assume US1=US2, US3=US4, and T1=T2.  US1, US2 are the least relevant
machines, and the only ones where I could see a slowdown for rdpc.
T1 is also getting irrelevant, more so than US3,US4 I think.

T3 and T4 are of course quite relevant, so we should take these into
account.  If they run rdpc no slower than the thunk call, then we should
use rdpc unconditionally.

I used this test program:

thunk:  retl
        mov     %o7, %g5
        .globl  main
main:   save    %sp, -176, %sp
        set     1593000000, %g1
1:
!       rd      %pc, %g5
!       rd      %pc, %g5
        call    thunk
         nop
        call    thunk
         nop
        brnz    %g1, 1b
        dec     %g1
        ret
        restore

At http://docs.oracle.com/cd/E26502_01/html/E28387/gentextid-2583.html
Oracle assumes one uses rdpc.  They also seem to say that the gdop stuff
is for the 64-bit ABI, and now we use if in sparc32.

-- 
Torbjörn
_______________________________________________
gmp-devel mailing list
[email protected]
http://gmplib.org/mailman/listinfo/gmp-devel

Reply via email to