https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115084
Bug ID: 115084 Summary: Missed optimization in division for AVR target, not using __*divmodpsi4 Product: gcc Version: 14.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: kamkaz at windowslive dot com Target Milestone: --- In case the width of both nominator and denominator in integer division are known to be <=24 bits, in AVR code generation it would be much more efficient to use psi (24-bit) routines, instead of si (32bit) routines. avr-gcc -O2 #include <stdint.h> uint16_t division(uint16_t den) { return 7500000u/den; } The code generated will use __udivmodsi4, despite __udivmodpsi4 being suitable. 00000000 <division>: 0: 28 2f mov r18, r24 2: 39 2f mov r19, r25 4: 40 e0 ldi r20, 0x00 ; 0 6: 50 e0 ldi r21, 0x00 ; 0 8: 60 ee ldi r22, 0xE0 ; 224 a: 70 e7 ldi r23, 0x70 ; 112 c: 82 e7 ldi r24, 0x72 ; 114 e: 90 e0 ldi r25, 0x00 ; 0 10: 03 d0 rcall .+6 ; 0x18 <__udivmodsi4> 12: 82 2f mov r24, r18 14: 93 2f mov r25, r19 16: 08 95 ret Would it be considered to add an optimization, that uses *psi routines in such cases? I expect it to be worthwhile, since division is rather computation heavy, It would save ~250 cycles per division where applicable, while the added division routine occupies 54 bytes. A workaround is to cast to (__uint24) before dividing, but the __uint24 extension is not widely known, and one would expect such an optimization to trigger by itself.