I realized gcc does not use optimized const shifts well but instead does
replace some (3-7) left shifts on 32bit variables with add and adc
If the shift is not optimized and just unrolled they should be the same
cycle count but for some reason it also adds some mov operations in the
later part making it even worse performance wise. In terms of bytecode size
shifts are better than add and adc operations. Also all const shifts can
get optimized even better as one can see for the other variable sizes e.g.
16bit. Also I should mention this only happens with left shifts on 32bit
(maybe also on 24bit) and with some non Os optimizer option.
I sent const case optimisation to the patch mailing list but was not able
to figure out where this bad optimisation is coming from.

I prepared a compiler explorer example for you to get a easy grasp on it:

In case compiler explorer does not work out for you the example code is:
unsigned long lshift32_c(const unsigned long value) {
return value << 7;

resulting in a lot of in case of O0 showing the wrong replacement by
add r24,r24
adc r25,r25
adc r26,r26
adc r27,r27

resulting in a lot of singe not optimized shifts and some additional
useless mov near the end with O2:
lsl r24
rol r25
rol r26
rol r27
movw r18,r24
movw r20,r26
lsl r18
rol r19
rol r20
rol r21

Reply via email to