On 11/09/16 15:11, Jeppe Johansen wrote:
Here's an ARM version that runs in 5 cycles on a Cortex A8:
    mov    r2,r1,lsr #5
    mov    r12,#1
    ldr    r3,[r0, r2, lsl #2]!
    orr    r2,r3,r12,lsl r1
    str    r2,[r0]
    and    r0,r12,r3,lsr r1

It's one cycle faster than what the compiler can generate due to it not
doing the pre-indexed writeback optimization when the address
calculation has shifts.

Given that this code will be in an non-inlinable routine (we can't inline routines with inline assembler), the Pascal version is probably faster then (since you won't have the call/return overhead).


Jonas

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Reply via email to