https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80080

--- Comment #11 from stli at linux dot ibm.com <stli at linux dot ibm.com> ---
Hi,
I've retested the samples with gcc 7, 8 and head from 2018-07-20, but there are
still issues:
The examples foo1 and foo2 are okay.

The issue in example foo3 is still present (see description of the bug-report):

00000000000000a0 <foo3>:
  a0:   a7 18 00 05             lhi     %r1,5
  a4:   c4 2d 00 00 00 00       lrl     %r2,a4 <foo3+0x4>
                        a6: R_390_PC32DBL       foo3_mem+0x2

  aa:   c0 30 00 00 00 00       larl    %r3,aa <foo3+0xa>
                        ac: R_390_PC32DBL       foo3_mem+0x2
  b0:   ba 21 30 00             cs      %r2,%r1,0(%r3)
  b4:   a7 74 ff fb             jne     aa <foo3+0xa>

The address of the global variable is still reloaded within the loop. If the
value was not swapped with cs, the jne can jump directly to the cs instruction
instead of the larl-instruction.

  b8:   b9 14 00 22             lgfr    %r2,%r2
  bc:   07 fe                   br      %r14
  be:   07 07                   nopr    %r7

I've found a further issue which is observable with the following two examples.
See the questions in the disassembly:

void foo4(int *mem)
{
  int oldval = 0;
  if (!__atomic_compare_exchange_n (mem, (void *) &oldval, 1,
                                    1, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
    {
      bar (mem);
    }
  /*
0000000000000000 <foo4>:
   0:   e3 10 20 00 00 12       lt      %r1,0(%r2)
   6:   a7 74 00 06             jne     12 <foo4+0x12>

Why do we need to jump to 0x12 first instead of directly jumping to 0x18?

   a:   a7 38 00 01             lhi     %r3,1
   e:   ba 13 20 00             cs      %r1,%r3,0(%r2)
  12:   a7 74 00 03             jne     18 <foo4+0x18>
  16:   07 fe                   br      %r14
  18:   c0 f4 00 00 00 00       jg      18 <foo4+0x18>
                        1a: R_390_PC32DBL       bar+0x2
  1e:   07 07                   nopr    %r7
  */
}


void foo5(int *mem)
{
  int oldval = 0;
  __atomic_compare_exchange_n (mem, (void *) &oldval, 1,
                               1, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED);
  if (oldval != 0)
    bar (mem);
  /*
0000000000000040 <foo5>:
  40:   e3 10 20 00 00 12       lt      %r1,0(%r2)
  46:   a7 74 00 06             jne     52 <foo5+0x12>

This is similar to foo4, but the variable oldval is compared against zero
instead of using the return value of __atomic_compare_exchange_n.
Can't we jump directly to 0x5a instead of 0x52?

  4a:   a7 38 00 01             lhi     %r3,1
  4e:   ba 13 20 00             cs      %r1,%r3,0(%r2)
  52:   12 11                   ltr     %r1,%r1
  54:   a7 74 00 03             jne     5a <foo5+0x1a>
  58:   07 fe                   br      %r14
  5a:   c0 f4 00 00 00 00       jg      5a <foo5+0x1a>
                        5c: R_390_PC32DBL       bar+0x2
   */
}

Reply via email to