https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118012
--- Comment #4 from Georg-Johann Lay <gjl at gcc dot gnu.org> ---
It's even crazier when the device doesn't have MUL instruction. In that case,
a libgcc function is used. With -Os the call consumes less code than the
bit-extract + extend + neg + and, so a library call is invoked:
$ avr-gcc -S -Os gcc.dg/tree-ssa/branchless-cond.c -dp
f1:
/* prologue: function */
mov r18,r22 ; 32 [c=4 l=2] *movhi/0
mov r19,r23
mov r22,r20 ; 33 [c=4 l=1] movqi_insn/0
mov r23,r21 ; 34 [c=4 l=1] movqi_insn/0
andi r24,1 ; 35 [c=8 l=2] *andhi3/2
clr r25
rcall __mulhi3 ; 36 [c=4 l=1] *mulhi3_call
eor r24,r18 ; 40 [c=4 l=1] *xorqi3
eor r25,r19 ; 41 [c=4 l=1] *xorqi3
/* epilogue start */
ret ; 44 [c=0 l=1] return
The move to accommodate for the ABI eat up all size gains, and the call
introduces more register pressure / clobbers.