https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115084

            Bug ID: 115084
           Summary: Missed optimization in division for AVR target, not
                    using __*divmodpsi4
           Product: gcc
           Version: 14.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: kamkaz at windowslive dot com
  Target Milestone: ---

In case the width of both nominator and denominator in integer division are
known to be <=24 bits, in AVR code generation it would be much more efficient
to use psi (24-bit) routines, instead of si (32bit) routines.

avr-gcc -O2

#include <stdint.h>
uint16_t division(uint16_t den) {
    return 7500000u/den;
}

The code generated will use __udivmodsi4, despite __udivmodpsi4 being suitable.

00000000 <division>:
   0:   28 2f           mov     r18, r24
   2:   39 2f           mov     r19, r25
   4:   40 e0           ldi     r20, 0x00       ; 0
   6:   50 e0           ldi     r21, 0x00       ; 0
   8:   60 ee           ldi     r22, 0xE0       ; 224
   a:   70 e7           ldi     r23, 0x70       ; 112
   c:   82 e7           ldi     r24, 0x72       ; 114
   e:   90 e0           ldi     r25, 0x00       ; 0
  10:   03 d0           rcall   .+6             ; 0x18 <__udivmodsi4>
  12:   82 2f           mov     r24, r18
  14:   93 2f           mov     r25, r19
  16:   08 95           ret

Would it be considered to add an optimization, that uses *psi routines in such
cases? I expect it to be worthwhile, since division is rather computation
heavy, It would save ~250 cycles per division where applicable, while the added
division routine occupies 54 bytes.

A workaround is to cast to (__uint24) before dividing, but the __uint24
extension is not widely known, and one would expect such an optimization to
trigger by itself.

Reply via email to