https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105904
Bug ID: 105904 Summary: Predicated mov r0, #1 with opposite conditions could be hoisted, between 1 and 1<<n in opposite sides of a branch Product: gcc Version: 12.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: peter at cordes dot ca Target Milestone: --- Target: arm-*-* #include <bit> // using the libstdc++ header unsigned roundup(unsigned x){ return std::bit_ceil(x); } https://godbolt.org/z/Px1fvWaex GCC's version is somewhat clunky, including MOV r0, #1 in either "side": roundup(unsigned int): cmp r0, #1 itttt hi addhi r3, r0, #-1 movhi r0, #1 @@ here clzhi r3, r3 rsbhi r3, r3, #32 ite hi lslhi r0, r0, r3 movls r0, #1 @@ here bx lr Even without spotting the other optimizations that clang finds, we can combine to a single unconditional MOV r0, #1. But only if we avoid setting flags, so it requires a 4-byte encoding, not MOVS. Still, it's one fewer instruction to execute. This is not totally trivial: it requires seeing that we can move it across the conditional LSL. So it's really a matter of folding the 1s between 1<<n and 1 in opposite sides of an if-converted branch. cmp r0, #1 ittt hi addhi r3, r0, #-1 clzhi r3, r3 rsbhi r3, r3, #32 mov r0, #1 @@ now unconditional it hi lslhi r0, r0, r3 bx lr clang makes rather nice asm for ARMv7 -mcpu=cortex-a53 as discussed in PR104773 which covers a different missed optimization in the same asm. roundup(unsigned int): @@ clang's version. subs r0, r0, #1 clz r0, r0 rsb r1, r0, #32 @ 32-clz mov r0, #1 lslhi r0, r0, r1 @ using flags set by SUBS bx lr @ 1<<(32-clz) or just 1 Folding the mov r0, #1 from either side is only a couple steps away from making the clz and rsb unconditional, and keeping only the LSL conditional.