https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92282

            Bug ID: 92282
           Summary: gimple for (a + ~b) is harder to optimize in RTL when
                    types are unsigned
           Product: gcc
           Version: 10.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rearnsha at gcc dot gnu.org
  Target Milestone: ---

Given:

t f1(t a, t b) { return a + ~b; }

if t is of type int64_t, then the gimple produced is


  _1 = ~b_2(D);
  _4 = _1 + a_3(D);

Which on Arm can then easily optimize into a 3 instruction sequence

MVN  R2, R2
ADDS R0, R0, R2
SBC  R1, R1, R3

(because on Arm, SBC = Rn - Rm - ~C == Rn + ~Rm + C)

But if the type is changed to uint64_t, then the gimple is transformed into

  _1 = a_2(D) - b_3(D);
  _4 = _1 + 18446744073709551615;

Which is almost impossible for the back-end to optimize back into the optimal
sequence.  The result is that we end up with two carry-propagating subtract
operations instead of one and less parallelism in the overall sequence as the
bit-wise invert can operate in parallel on any super-scalar architecture.

Note that the same problem likely exists on 64-bit architectures if t is
uint128_t.

Reply via email to