[Bug rtl-optimization/81300] New: -fpeephole2 breaks __builtin_ia32_sbb_u64, _subborrow_u64 on AMD64

andreser-gccbugs at mit dot edu Mon, 03 Jul 2017 19:41:38 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81300


            Bug ID: 81300
           Summary: -fpeephole2 breaks __builtin_ia32_sbb_u64,
                    _subborrow_u64 on AMD64
           Product: gcc
           Version: 7.1.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: andreser-gccbugs at mit dot edu
  Target Milestone: ---

Here is a short program for gcc 7.1.1 gives different output with "-O1
-fpeephole2 -m64" and "-O1 -m64".

int main() {
  unsigned long long _discard = 0, zero = 0, maxull = 0;
  unsigned char zero1 = __builtin_ia32_addcarryx_u64(0, 0, 0, &_discard);
  unsigned char zero2 = __builtin_ia32_addcarryx_u64(zero1, 0, 0, &zero);
  __builtin_ia32_sbb_u64(0x0, 2, -1, &_discard);
  unsigned char one = __builtin_ia32_sbb_u64(0, zero, 1, &maxull);
  unsigned long long x = __builtin_ia32_sbb_u64(one, zero2, 0, &_discard);
  unsigned long long z1 = 0;
  __asm__ ("movq %1, %0;" :"+r"(z1) :"r"(x));
  unsigned long long z2 = 3;
  __asm__ ("movq %1, %0;" :"+r"(z2) :"r"(x));
  return 1-(z1 | z2);
}

Without -fpeephole2, the exit code is 0. With -fpeephole2, the exit code is 1.
I think this program should be deterministic, so I am tentatively attributing
the difference to a flaw in the peephole2 optimizations. Disassembling the
compiled code indeed shows that one of the SBB intrinsics has been dropped...
of course this by itself isn't evidence of anything going wrong as the whole
program could in principle be constant-propagated away, but what is going on
looks off to me.

Annotated side-by-side diff of relevant disassembly:
http://web.mit.edu/~andreser/Public/O1-fpeephole2.diff.html

The same disassembly for email-users' convenience. O1:

0000000000000000 <main>:
   0:   bf 00 00 00 00          mov    $0x0,%edi
   5:   b8 00 00 00 00          mov    $0x0,%eax
   a:   ba 00 00 00 00          mov    $0x0,%edx
   f:   80 c2 ff                add    $0xff,%dl
  12:   48 89 c1                mov    %rax,%rcx
  15:   48 11 c1                adc    %rax,%rcx
  18:   0f 92 c2                setb   %dl
; dl = 0
  1b:   be 01 00 00 00          mov    $0x1,%esi
  20:   40 80 c7 ff             add    $0xff,%dil
  24:   48 19 f1                sbb    %rsi,%rcx ; rcx-rsi = 0 - 1 = 0xff...ff,
CF = 1
  27:   0f 92 c1                setb   %cl
; cl = 1
  2a:   0f b6 d2                movzbl %dl,%edx
  2d:   80 c1 ff                add    $0xff,%cl ; cl = 0; CF = 1
  30:   48 19 c2                sbb    %rax,%rdx
; rdx = -1; CF = 1
  33:   0f 92 c1                setb   %cl
  36:   0f b6 c9                movzbl %cl,%ecx
  39:   48 89 c8                mov    %rcx,%rax
  3c:   ba 03 00 00 00          mov    $0x3,%edx
  41:   48 89 ca                mov    %rcx,%rdx
  44:   09 d0                   or     %edx,%eax
  46:   ba 01 00 00 00          mov    $0x1,%edx
  4b:   29 c2                   sub    %eax,%edx
  4d:   89 d0                   mov    %edx,%eax
  4f:   c3                      retq   


With -fpeephole2:

0000000000000000 <main>:

   0:   31 c0                   xor    %eax,%eax
   2:   31 d2                   xor    %edx,%edx
   4:   80 c2 ff                add    $0xff,%dl
   7:   48 89 c1                mov    %rax,%rcx
   a:   48 11 c1                adc    %rax,%rcx
   d:   0f 92 c2                setb   %dl
; dl = 0




; cl = 0
  10:   0f b6 d2                movzbl %dl,%edx
  13:   31 c9                   xor    %ecx,%ecx ; cl = 0; CF = 0
  15:   48 19 c2                sbb    %rax,%rdx
; rdx = 0; CF = 0
  18:   0f 92 c1                setb   %cl

  1b:   48 89 c8                mov    %rcx,%rax
  1e:   ba 03 00 00 00          mov    $0x3,%edx
  23:   48 89 ca                mov    %rcx,%rdx
  26:   09 d0                   or     %edx,%eax
  27:   ba 01 00 00 00          mov    $0x1,%edx
  2d:   29 c2                   sub    %eax,%edx
  2f:   89 d0                   mov    %edx,%eax
  31:   c3                      retq

[Bug rtl-optimization/81300] New: -fpeephole2 breaks __builtin_ia32_sbb_u64, _subborrow_u64 on AMD64

Reply via email to