https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68718

            Bug ID: 68718
           Summary: libgomp c.exp with -fipa-pta hang in sort-1.c
           Product: gcc
           Version: 6.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

Created attachment 36932
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36932&action=edit
minimal sort-1.c

The test hangs, because in this loop, we optimize away the test on busy:
...
                    while (size_int_pair_stack (&global_stack) == 0
                           && busy)
                      busy_wait ();
...

Before the PR46043 fix, the loop looks like this just before fre2:
...
;;   basic block 27, loop depth 3, count 0, freq 10000, maybe hot
;;    prev block 26, next block 28, flags: (NEW, REACHABLE)
;;    pred:       26 [100.0%]  (FALLTHRU,EXECUTABLE)
;;                29 [100.0%]  (FALLTHRU,DFS_BACK,EXECUTABLE)
  # .MEM_10 = PHI <.MEM_88(26), .MEM_92(29)>
  # VUSE <.MEM_10>
  # PT = nonlocal unit-escaped 
  _89 = MEM[(struct .omp_data_s.5D.3564 &).omp_data_i_16(D) clique 1 base
1].global_stackD.3576;
  # VUSE <.MEM_10>
  # USE = nonlocal unit-escaped { D.3656 } (escaped)
  _90 = size_int_pair_stackD.3436 (_89);
  if (_90 == 0)
    goto <bb 28>;
  else
    goto <bb 20>;
;;    succ:       28 [95.5%]  (TRUE_VALUE,EXECUTABLE)
;;                20 [4.5%]  (FALSE_VALUE,EXECUTABLE)

;;   basic block 28, loop depth 3, count 0, freq 9550, maybe hot
;;    prev block 27, next block 29, flags: (NEW, REACHABLE)
;;    pred:       27 [95.5%]  (TRUE_VALUE,EXECUTABLE)
  # VUSE <.MEM_10>
  _91 = MEM[(struct .omp_data_s.5D.3564 &).omp_data_i_16(D) clique 1 base
1].busyD.3574;
  if (_91 != 0)
    goto <bb 29>;
  else
    goto <bb 20>;
;;    succ:       29 [95.5%]  (TRUE_VALUE,EXECUTABLE)
;;                20 [4.5%]  (FALSE_VALUE,EXECUTABLE)

;;   basic block 29, loop depth 3, count 0, freq 9120, maybe hot
;;    prev block 28, next block 30, flags: (NEW, REACHABLE)
;;    pred:       28 [95.5%]  (TRUE_VALUE,EXECUTABLE)
  # .MEM_92 = VDEF <.MEM_10>
  # USE = nonlocal unit-escaped null { D.3473 D.3474 D.3566 D.3567 D.3650
D.3653 D.3654 D.3656 D.3807 } (escaped)
  # CLB = nonlocal unit-escaped null { D.3473 D.3474 D.3566 D.3567 D.3650
D.3653 D.3654 D.3656 D.3807 } (escaped)
  busy_waitD.3439 ();
  goto <bb 27>;
;;    succ:       27 [100.0%]  (FALLTHRU,DFS_BACK,EXECUTABLE)
...

And the fix for PR46032 has the following effect:
...
 ;;   basic block 27, loop depth 3, count 0, freq 10000, maybe hot
 ;;    prev block 26, next block 28, flags: (NEW, REACHABLE)
 ;;    pred:       26 [100.0%]  (FALLTHRU,EXECUTABLE)
 ;;                29 [100.0%]  (FALLTHRU,DFS_BACK,EXECUTABLE)
   # .MEM_10 = PHI <.MEM_88(26), .MEM_92(29)>
   # VUSE <.MEM_10>
-  # PT = nonlocal unit-escaped 
+  # PT = { D.3474 } (escaped)
   _89 = MEM[(struct .omp_data_s.5D.3564 &).omp_data_i_16(D) clique 1 base
1].global_stackD.3576;
   # VUSE <.MEM_10>
-  # USE = nonlocal unit-escaped { D.3656 } (escaped)
+  # USE = { D.3474 D.3656 } (escaped)
   _90 = size_int_pair_stackD.3436 (_89);
   if (_90 == 0)
     goto <bb 28>;
   else
     goto <bb 20>;
 ;;    succ:       28 [95.5%]  (TRUE_VALUE,EXECUTABLE)
 ;;                20 [4.5%]  (FALSE_VALUE,EXECUTABLE)

 ;;   basic block 28, loop depth 3, count 0, freq 9550, maybe hot
 ;;    prev block 27, next block 29, flags: (NEW, REACHABLE)
 ;;    pred:       27 [95.5%]  (TRUE_VALUE,EXECUTABLE)
   # VUSE <.MEM_10>
   _91 = MEM[(struct .omp_data_s.5D.3564 &).omp_data_i_16(D) clique 1 base
1].busyD.3574;
   if (_91 != 0)
     goto <bb 29>;
   else
     goto <bb 20>;
 ;;    succ:       29 [95.5%]  (TRUE_VALUE,EXECUTABLE)
 ;;                20 [4.5%]  (FALSE_VALUE,EXECUTABLE)

 ;;   basic block 29, loop depth 3, count 0, freq 9120, maybe hot
 ;;    prev block 28, next block 30, flags: (NEW, REACHABLE)
 ;;    pred:       28 [95.5%]  (TRUE_VALUE,EXECUTABLE)
   # .MEM_92 = VDEF <.MEM_10>
-  # USE = nonlocal unit-escaped null { D.3473 D.3474 D.3566 D.3567 D.3650
D.3653 D.3654 D.3656 D.3807 } (escaped)
-  # CLB = nonlocal unit-escaped null { D.3473 D.3474 D.3566 D.3567 D.3650
D.3653 D.3654 D.3656 D.3807 } (escaped)
+  # USE = nonlocal unit-escaped null { D.3473 D.3474 D.3656 } (escaped)
+  # CLB = nonlocal unit-escaped null { D.3473 D.3474 D.3656 } (escaped)
   busy_waitD.3439 ();
   goto <bb 27>;
 ;;    succ:       27 [100.0%]  (FALLTHRU,DFS_BACK,EXECUTABLE)
...

AFICT, the new alias information is correct.

And the optimization by fre looks reasonable: the variable busy does not seem
to be modified by anything in the loop, so it removes the test on busy from the
loop.

But in fact, busy is a shared variable, so it can be modified by one of the
other threads:
...
  int busy = 1;
  int num_threads;

  omp_init_lock (&lock);
  init_int_pair_stack (&global_stack);
#pragma omp parallel firstprivate (array, count)
  {
....

In other words, inside the parallel region, num_threads behaves as a volatile
variable (it may return different values when read twice, while nothing seems
to be writing to it inbetween the two reads).

So AFAIU, we need to mark the uses of shared variables in openmp regions as
volatile.

Reply via email to