I tried some timing of call to a pc loading thunk versus an rdpc
instruction. Approximate cycle counts:
rdpc thunk
US2 5 2
US3 6 6
T1 6 10
I assume US1=US2, US3=US4, and T1=T2. US1, US2 are the least relevant
machines, and the only ones where I could see a slowdown for rdpc.
T1 is also getting irrelevant, more so than US3,US4 I think.
T3 and T4 are of course quite relevant, so we should take these into
account. If they run rdpc no slower than the thunk call, then we should
use rdpc unconditionally.
I used this test program:
thunk: retl
mov %o7, %g5
.globl main
main: save %sp, -176, %sp
set 1593000000, %g1
1:
! rd %pc, %g5
! rd %pc, %g5
call thunk
nop
call thunk
nop
brnz %g1, 1b
dec %g1
ret
restore
At http://docs.oracle.com/cd/E26502_01/html/E28387/gentextid-2583.html
Oracle assumes one uses rdpc. They also seem to say that the gdop stuff
is for the 64-bit ABI, and now we use if in sparc32.
--
Torbjörn
_______________________________________________
gmp-devel mailing list
[email protected]
http://gmplib.org/mailman/listinfo/gmp-devel