On 11/09/16 15:11, Jeppe Johansen wrote:
Here's an ARM version that runs in 5 cycles on a Cortex A8:
mov r2,r1,lsr #5
mov r12,#1
ldr r3,[r0, r2, lsl #2]!
orr r2,r3,r12,lsl r1
str r2,[r0]
and r0,r12,r3,lsr r1
It's one cycle faster than what the compiler can generate due to it not
doing the pre-indexed writeback optimization when the address
calculation has shifts.
Given that this code will be in an non-inlinable routine (we can't
inline routines with inline assembler), the Pascal version is probably
faster then (since you won't have the call/return overhead).
Jonas
_______________________________________________
fpc-devel maillist - fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel