https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81300
Bug ID: 81300 Summary: -fpeephole2 breaks __builtin_ia32_sbb_u64, _subborrow_u64 on AMD64 Product: gcc Version: 7.1.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: andreser-gccbugs at mit dot edu Target Milestone: --- Here is a short program for gcc 7.1.1 gives different output with "-O1 -fpeephole2 -m64" and "-O1 -m64". int main() { unsigned long long _discard = 0, zero = 0, maxull = 0; unsigned char zero1 = __builtin_ia32_addcarryx_u64(0, 0, 0, &_discard); unsigned char zero2 = __builtin_ia32_addcarryx_u64(zero1, 0, 0, &zero); __builtin_ia32_sbb_u64(0x0, 2, -1, &_discard); unsigned char one = __builtin_ia32_sbb_u64(0, zero, 1, &maxull); unsigned long long x = __builtin_ia32_sbb_u64(one, zero2, 0, &_discard); unsigned long long z1 = 0; __asm__ ("movq %1, %0;" :"+r"(z1) :"r"(x)); unsigned long long z2 = 3; __asm__ ("movq %1, %0;" :"+r"(z2) :"r"(x)); return 1-(z1 | z2); } Without -fpeephole2, the exit code is 0. With -fpeephole2, the exit code is 1. I think this program should be deterministic, so I am tentatively attributing the difference to a flaw in the peephole2 optimizations. Disassembling the compiled code indeed shows that one of the SBB intrinsics has been dropped... of course this by itself isn't evidence of anything going wrong as the whole program could in principle be constant-propagated away, but what is going on looks off to me. Annotated side-by-side diff of relevant disassembly: http://web.mit.edu/~andreser/Public/O1-fpeephole2.diff.html The same disassembly for email-users' convenience. O1: 0000000000000000 <main>: 0: bf 00 00 00 00 mov $0x0,%edi 5: b8 00 00 00 00 mov $0x0,%eax a: ba 00 00 00 00 mov $0x0,%edx f: 80 c2 ff add $0xff,%dl 12: 48 89 c1 mov %rax,%rcx 15: 48 11 c1 adc %rax,%rcx 18: 0f 92 c2 setb %dl ; dl = 0 1b: be 01 00 00 00 mov $0x1,%esi 20: 40 80 c7 ff add $0xff,%dil 24: 48 19 f1 sbb %rsi,%rcx ; rcx-rsi = 0 - 1 = 0xff...ff, CF = 1 27: 0f 92 c1 setb %cl ; cl = 1 2a: 0f b6 d2 movzbl %dl,%edx 2d: 80 c1 ff add $0xff,%cl ; cl = 0; CF = 1 30: 48 19 c2 sbb %rax,%rdx ; rdx = -1; CF = 1 33: 0f 92 c1 setb %cl 36: 0f b6 c9 movzbl %cl,%ecx 39: 48 89 c8 mov %rcx,%rax 3c: ba 03 00 00 00 mov $0x3,%edx 41: 48 89 ca mov %rcx,%rdx 44: 09 d0 or %edx,%eax 46: ba 01 00 00 00 mov $0x1,%edx 4b: 29 c2 sub %eax,%edx 4d: 89 d0 mov %edx,%eax 4f: c3 retq With -fpeephole2: 0000000000000000 <main>: 0: 31 c0 xor %eax,%eax 2: 31 d2 xor %edx,%edx 4: 80 c2 ff add $0xff,%dl 7: 48 89 c1 mov %rax,%rcx a: 48 11 c1 adc %rax,%rcx d: 0f 92 c2 setb %dl ; dl = 0 ; cl = 0 10: 0f b6 d2 movzbl %dl,%edx 13: 31 c9 xor %ecx,%ecx ; cl = 0; CF = 0 15: 48 19 c2 sbb %rax,%rdx ; rdx = 0; CF = 0 18: 0f 92 c1 setb %cl 1b: 48 89 c8 mov %rcx,%rax 1e: ba 03 00 00 00 mov $0x3,%edx 23: 48 89 ca mov %rcx,%rdx 26: 09 d0 or %edx,%eax 27: ba 01 00 00 00 mov $0x1,%edx 2d: 29 c2 sub %eax,%edx 2f: 89 d0 mov %edx,%eax 31: c3 retq