https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109907
Bug ID: 109907
Summary: [avr] Missed optimization for bit extraction (uses
shift instead of single bit-test)
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: gjl at gcc dot gnu.org
Target Milestone: ---
Created attachment 55116
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55116&action=edit
C test case.
The following missed optimization occurs with current v14 master and also with
older versions of the compiler:
$ avr-gcc ext.c -dumpbase "" -save-temps -dp -mmcu=atmega128 -c -Os
Functons like
uint8_t cset_32bit31 (uint32_t x)
{
return (x & (1ul << 31)) ? 1 : 0; // bloat
}
that extract a single bit might generate very expensive code like:
cset_32bit31:
movw r26,r24 ; 18 [c=4 l=1] *movhi/0
movw r24,r22 ; 19 [c=4 l=1] *movhi/0
lsl r27 ; 24 [c=16 l=4] *ashrsi3_const/3
sbc r24,r24
mov r25,r24
movw r26,r24
andi r24,lo8(1) ; 12 [c=4 l=1] andqi3/1
ret ; 22 [c=0 l=1] return
where the following 3 instructions would suffice. This is smaller, faster and
imposes no additioal register pressure:
bst r25,7 ; 16 [c=4 l=3] *extzv/4
clr r24
bld r24,0
What also would work is loading 0 or 1 depending on a single bit like:
LDI r24, 0 # R24 = 0
SBRC r25, 7 # Skip next instruction if R25.7 == 0.
LDI r24, 1 # R24 = 1
The bloat also occurs when the complement of the bit is extracted like in
uint8_t cset_32bit30_not (uint32_t x)
{
return (x & (1ul << 30)) ? 0 : 1; // bloat
}
cset_32bit30_not:
movw r26,r24 ; 19 [c=4 l=1] *movhi/0
movw r24,r22 ; 20 [c=4 l=1] *movhi/0
ldi r18,30 ; 25 [c=44 l=7] *lshrsi3_const/3
1:
lsr r27
ror r26
ror r25
ror r24
dec r18
brne 1b
ldi r18,1 ; 7 [c=32 l=2] xorsi3/2
eor r24,r18
andi r24,lo8(1) ; 13 [c=4 l=1] andqi3/1
ret ; 23 [c=0 l=1] return
This case is even worse because it's a loop of 30 single bit-shifts to extract
the bit. Again, skipping one instrauction depending on a bit was possible:
LDI r24, 1 # R24 = 1
SBRC r25, 6 # Skip next instruction if R25.7 == 0.
LDI r24, 0 # R24 = 0
or
LDI r24, 0 # R24 = 0
SBRS r25, 6 # Skip next instruction if R25.7 == 1.
LDI r24, 1 # R24 = 1
or extract one bit using the T-flag:
BST r25, 6 # SREG.T = R25.6
LDI r24, 0xff # R24 = 0xff
BLD r24, 0 # R24.0 = SREG.T
COM r24 # R24 = R24 ^ 0xff
-------------------------------------------------------
Configured with: --target=avr --disable-nls --with-dwarf2 --with-gnu-as
--with-gnu-ld --disable-shared --enable-languages=c,c++
gcc version 14.0.0 20230518 (experimental) (GCC)