https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92282
Bug ID: 92282 Summary: gimple for (a + ~b) is harder to optimize in RTL when types are unsigned Product: gcc Version: 10.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rearnsha at gcc dot gnu.org Target Milestone: --- Given: t f1(t a, t b) { return a + ~b; } if t is of type int64_t, then the gimple produced is _1 = ~b_2(D); _4 = _1 + a_3(D); Which on Arm can then easily optimize into a 3 instruction sequence MVN R2, R2 ADDS R0, R0, R2 SBC R1, R1, R3 (because on Arm, SBC = Rn - Rm - ~C == Rn + ~Rm + C) But if the type is changed to uint64_t, then the gimple is transformed into _1 = a_2(D) - b_3(D); _4 = _1 + 18446744073709551615; Which is almost impossible for the back-end to optimize back into the optimal sequence. The result is that we end up with two carry-propagating subtract operations instead of one and less parallelism in the overall sequence as the bit-wise invert can operate in parallel on any super-scalar architecture. Note that the same problem likely exists on 64-bit architectures if t is uint128_t.