Instead of jumping to a place that ROLs r_arg1 (with C=0), LSL r_arg1 can be performed prior to the loop. This reduces the number of loopings from 9 to 8.
Applied as obvious. Johann AVR: target/114794 - Tweak __udivmodqi4 libgcc/ PR target/114794 * config/avr/lib1funcs.S (__udivmodqi4): Tweak. diff --git a/libgcc/config/avr/lib1funcs.S b/libgcc/config/avr/lib1funcs.S index 535510ab867..af4d7d97016 100644 --- a/libgcc/config/avr/lib1funcs.S +++ b/libgcc/config/avr/lib1funcs.S @@ -1339,9 +1339,9 @@ DEFUN __umulsidi3 #if defined (L_udivmodqi4) DEFUN __udivmodqi4 - sub r_rem,r_rem ; clear remainder and carry - ldi r_cnt,9 ; init loop counter - rjmp __udivmodqi4_ep ; jump to entry point + clr r_rem ; clear remainder + ldi r_cnt,8 ; init loop counter + lsl r_arg1 ; shift dividend __udivmodqi4_loop: rol r_rem ; shift dividend into remainder cp r_rem,r_arg2 ; compare remainder & divisor