> As Dmitry K. wrote:
>
> > I have look 'ffs' from newlib (1.12.0). It take up to 800 (!)
> > clocks. Reason: shift is included to loop.
>
> The slightly modified BSD code (counter reduced to 8 bits) yields
>
> sbiw r24,0
> breq .L1
> ldi r18,lo8(1)
> .L9:
> sbrc r24,0
> rjmp .L8
> lsr r25
> ror r24
> subi r18,lo8(-(1))
> rjmp .L9
> .L8:
> mov r24,r18
> clr r25
> .L1:
> ret
>
> This makes 6 clocks per cycle, so up to ~ 100 clocks max.
>
> I'm interested in seeing Dmitry's code...
> --
> cheers, J"org .-.-. --... ...-- -.. . DL8DTL
>
Just for fun, I tried implementing ffs in pure assembler. I haven't tested
this code, so I might have made a silly mistake somewhere, or counted cycles
incorrectly. However, it has a number of features - by only looking at
either the upper or lower byte (as appropriate), the inner loop is 4 cycles
instead of 6, and the maximum number of iterations is 8. It only uses r24
and r25, with no temporary registers - that might be an advantage if the
function were to be inlined? It's not quite as good as Dmitry's, being 13
words and a maximum of 43 cycles:
.global ffs
ffs:
subi r24, 0
breq .L1 ; Lo byte 0
mov r25, r24
clr r24
rjmp .L2
.L1:
subi r25, 0
breq .L3
ldi r24, 8 ; Start with offset of 8
.L2:
inc r24
asr r25
brcc .L2
clr r25
.L3:
ret
Words = 13
Cycles (for a result of "n") :
0 6 + 5
1..8 6 + n*4 + 5
9..16 6 + (n-8)*4 + 5
Max = 43
mvh.,
David
_______________________________________________
AVR-libc-dev mailing list
[email protected]
http://lists.nongnu.org/mailman/listinfo/avr-libc-dev