https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69615

            Bug ID: 69615
           Summary: 0 to limit signed range checks don't always use
                    unsigned compare
           Product: gcc
           Version: 5.3.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: peter at cordes dot ca
  Target Milestone: ---

gcc sometimes misses the unsigned-compare trick for checking if a signed value
is between 0 and limit (where limit is known to be <= INT_MAX).

It seems that gcc fails when the upper limit is a variable, even if I shift or
mask it down to a small range.  clang handles this case, so I'm sure I
constructed my test case in a way that could be optimized.



All the code in this bug report is on godbolt, for ease of trying with older
versions of gcc (including for ARM64/ARM/PPC), and with clang / icc13. 
http://goo.gl/V7PFmv.   (I used -x c to compile as C, even though it only
provides c++ compilers).

This appears to be arch-independent (unless my quick skim of asm for ISAs I
barely know misled me...)

--------

The simplest case is when the upper limit is a compile-time constant.  There's
one case where gcc and clang fail to optimize:  x<=(INT_MAX-1), or
equivalently, x<INT_MAX.


#include <limits.h>
#include <stdint.h>
extern void ext(void);

// clang and gcc both optimize range checks up to INT_MAX-2 to a single
unsigned compare
void r0_to_imax_2(int x){ if (x>=0 && x<=(INT_MAX-2)) ext(); }  // good code
void r0_to_imax_1(int x){ if (x>=0 && x<=(INT_MAX-1)) ext(); }  // bad code
void r0_to_imax  (int x){ if (x>=0 && x<=(INT_MAX-0)) ext(); }  // good code
(test/js.  not shown)

gcc 5.3.0 -Ofast -mtune=haswell  compiles this to:

r0_to_imax_2:
        cmpl    $2147483645, %edi   # that's 0x7ffffffd
        jbe     .L4
        ret
.L4:    jmp     ext

r0_to_imax_1:
        movl    %edi, %eax
        subl    $0, %eax       ## Without any -mtune, uses test edi,edi
        js      .L5
        cmpl    $2147483647, %edi   # that's 0x7fffffff
        je      .L5
        jmp     ext
.L5:    ret

ICC13 compiles this last one to cmp  edi, 0x7ffffffe / ja, so unless my mental
logic is wrong *and* icc13 is buggy, gcc and clang should still be able to  use
the same optimization as for smaller upper-limits.  They don't: both clang and
gcc use two compare-and-branches for r0_to_imax_1.

BTW, the movl %edi, %eax / subl $0, %eax sequence is used instead of the test
instruction with -mtune=haswell, and even worse with -march=bdver2 where it
even prevents fusion into a compare-and-branch m-op.  I'll file a separate bug
report for that if anyone wants me to.  Agner Fog's microarch guide doesn't
mention anything that would give that sequence an advantage over test, unless
I'm missing something.  It slows AMD down more than (recent) Intel, but that's
not what tuning for Haswell means. :P



Now, on to the case where the limit is variable, but can easily be proven to
itself be in the range [0 .. INT_MAX-1) or much smaller.  (If the limit can be
negative (or unsigned greater than INT_MAX) the optimization is impossible: 
INT_MIN and other negative numbers could be "below" the limit.)


// gcc always fails to optimize this to an unsigned compare, but clang succeeds
void rangecheck_var(int64_t x, int64_t lim2) {
  //lim2 >>= 60;
  lim2 &= 0xf;  // let the compiler figure out the limited range of limit
  if (x>=0 && x<lim2) ext();
}

compiles to (with -O3, and -mtune=core2 to avoid the sub $0 sillyness):

rangecheck_var:
        andl    $15, %esi
        cmpq    %rdi, %rsi
        jle     .L16
        testq   %rdi, %rdi
        js      .L16
        jmp     ext
.L16:   ret

Reply via email to