https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69615
Bug ID: 69615 Summary: 0 to limit signed range checks don't always use unsigned compare Product: gcc Version: 5.3.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: peter at cordes dot ca Target Milestone: --- gcc sometimes misses the unsigned-compare trick for checking if a signed value is between 0 and limit (where limit is known to be <= INT_MAX). It seems that gcc fails when the upper limit is a variable, even if I shift or mask it down to a small range. clang handles this case, so I'm sure I constructed my test case in a way that could be optimized. All the code in this bug report is on godbolt, for ease of trying with older versions of gcc (including for ARM64/ARM/PPC), and with clang / icc13. http://goo.gl/V7PFmv. (I used -x c to compile as C, even though it only provides c++ compilers). This appears to be arch-independent (unless my quick skim of asm for ISAs I barely know misled me...) -------- The simplest case is when the upper limit is a compile-time constant. There's one case where gcc and clang fail to optimize: x<=(INT_MAX-1), or equivalently, x<INT_MAX. #include <limits.h> #include <stdint.h> extern void ext(void); // clang and gcc both optimize range checks up to INT_MAX-2 to a single unsigned compare void r0_to_imax_2(int x){ if (x>=0 && x<=(INT_MAX-2)) ext(); } // good code void r0_to_imax_1(int x){ if (x>=0 && x<=(INT_MAX-1)) ext(); } // bad code void r0_to_imax (int x){ if (x>=0 && x<=(INT_MAX-0)) ext(); } // good code (test/js. not shown) gcc 5.3.0 -Ofast -mtune=haswell compiles this to: r0_to_imax_2: cmpl $2147483645, %edi # that's 0x7ffffffd jbe .L4 ret .L4: jmp ext r0_to_imax_1: movl %edi, %eax subl $0, %eax ## Without any -mtune, uses test edi,edi js .L5 cmpl $2147483647, %edi # that's 0x7fffffff je .L5 jmp ext .L5: ret ICC13 compiles this last one to cmp edi, 0x7ffffffe / ja, so unless my mental logic is wrong *and* icc13 is buggy, gcc and clang should still be able to use the same optimization as for smaller upper-limits. They don't: both clang and gcc use two compare-and-branches for r0_to_imax_1. BTW, the movl %edi, %eax / subl $0, %eax sequence is used instead of the test instruction with -mtune=haswell, and even worse with -march=bdver2 where it even prevents fusion into a compare-and-branch m-op. I'll file a separate bug report for that if anyone wants me to. Agner Fog's microarch guide doesn't mention anything that would give that sequence an advantage over test, unless I'm missing something. It slows AMD down more than (recent) Intel, but that's not what tuning for Haswell means. :P Now, on to the case where the limit is variable, but can easily be proven to itself be in the range [0 .. INT_MAX-1) or much smaller. (If the limit can be negative (or unsigned greater than INT_MAX) the optimization is impossible: INT_MIN and other negative numbers could be "below" the limit.) // gcc always fails to optimize this to an unsigned compare, but clang succeeds void rangecheck_var(int64_t x, int64_t lim2) { //lim2 >>= 60; lim2 &= 0xf; // let the compiler figure out the limited range of limit if (x>=0 && x<lim2) ext(); } compiles to (with -O3, and -mtune=core2 to avoid the sub $0 sillyness): rangecheck_var: andl $15, %esi cmpq %rdi, %rsi jle .L16 testq %rdi, %rdi js .L16 jmp ext .L16: ret