https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112268
Bug ID: 112268 Summary: AVR-GCC generates suboptimal code for bit shifts Product: gcc Version: 13.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: anton at tchekov dot net Target Milestone: --- This function: uint8_t extract1(uint32_t val) { return val >> 26; } generates the following assembly code, which shifts all four registers 26 times in a loop. It is exactly the same on optimization levels O2, O3 and Os. (The exact shift amounts are not important) extract1: mov r27,r25 mov r26,r24 mov r25,r23 mov r24,r22 ldi r18,26 1: lsr r27 ror r26 ror r25 ror r24 dec r18 brne 1b ret It is possible to do a lot better with this workaround which uses only 3 instructions: (and is exactly equivalent) uint8_t extract2(uint32_t val) { uint8_t tmp = val >> 24; return tmp >> 2; } extract2: mov r24,r25 lsr r24 lsr r24 ret The "shift loop" only happens with 32-bit integers, but not with 16-bit, where the optimization opportunity is recognized: uint8_t extract3(uint16_t val) { return val >> 9; } extract3: mov r24,r25 lsr r24 ret