https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68718
Bug ID: 68718 Summary: libgomp c.exp with -fipa-pta hang in sort-1.c Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- Created attachment 36932 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36932&action=edit minimal sort-1.c The test hangs, because in this loop, we optimize away the test on busy: ... while (size_int_pair_stack (&global_stack) == 0 && busy) busy_wait (); ... Before the PR46043 fix, the loop looks like this just before fre2: ... ;; basic block 27, loop depth 3, count 0, freq 10000, maybe hot ;; prev block 26, next block 28, flags: (NEW, REACHABLE) ;; pred: 26 [100.0%] (FALLTHRU,EXECUTABLE) ;; 29 [100.0%] (FALLTHRU,DFS_BACK,EXECUTABLE) # .MEM_10 = PHI <.MEM_88(26), .MEM_92(29)> # VUSE <.MEM_10> # PT = nonlocal unit-escaped _89 = MEM[(struct .omp_data_s.5D.3564 &).omp_data_i_16(D) clique 1 base 1].global_stackD.3576; # VUSE <.MEM_10> # USE = nonlocal unit-escaped { D.3656 } (escaped) _90 = size_int_pair_stackD.3436 (_89); if (_90 == 0) goto <bb 28>; else goto <bb 20>; ;; succ: 28 [95.5%] (TRUE_VALUE,EXECUTABLE) ;; 20 [4.5%] (FALSE_VALUE,EXECUTABLE) ;; basic block 28, loop depth 3, count 0, freq 9550, maybe hot ;; prev block 27, next block 29, flags: (NEW, REACHABLE) ;; pred: 27 [95.5%] (TRUE_VALUE,EXECUTABLE) # VUSE <.MEM_10> _91 = MEM[(struct .omp_data_s.5D.3564 &).omp_data_i_16(D) clique 1 base 1].busyD.3574; if (_91 != 0) goto <bb 29>; else goto <bb 20>; ;; succ: 29 [95.5%] (TRUE_VALUE,EXECUTABLE) ;; 20 [4.5%] (FALSE_VALUE,EXECUTABLE) ;; basic block 29, loop depth 3, count 0, freq 9120, maybe hot ;; prev block 28, next block 30, flags: (NEW, REACHABLE) ;; pred: 28 [95.5%] (TRUE_VALUE,EXECUTABLE) # .MEM_92 = VDEF <.MEM_10> # USE = nonlocal unit-escaped null { D.3473 D.3474 D.3566 D.3567 D.3650 D.3653 D.3654 D.3656 D.3807 } (escaped) # CLB = nonlocal unit-escaped null { D.3473 D.3474 D.3566 D.3567 D.3650 D.3653 D.3654 D.3656 D.3807 } (escaped) busy_waitD.3439 (); goto <bb 27>; ;; succ: 27 [100.0%] (FALLTHRU,DFS_BACK,EXECUTABLE) ... And the fix for PR46032 has the following effect: ... ;; basic block 27, loop depth 3, count 0, freq 10000, maybe hot ;; prev block 26, next block 28, flags: (NEW, REACHABLE) ;; pred: 26 [100.0%] (FALLTHRU,EXECUTABLE) ;; 29 [100.0%] (FALLTHRU,DFS_BACK,EXECUTABLE) # .MEM_10 = PHI <.MEM_88(26), .MEM_92(29)> # VUSE <.MEM_10> - # PT = nonlocal unit-escaped + # PT = { D.3474 } (escaped) _89 = MEM[(struct .omp_data_s.5D.3564 &).omp_data_i_16(D) clique 1 base 1].global_stackD.3576; # VUSE <.MEM_10> - # USE = nonlocal unit-escaped { D.3656 } (escaped) + # USE = { D.3474 D.3656 } (escaped) _90 = size_int_pair_stackD.3436 (_89); if (_90 == 0) goto <bb 28>; else goto <bb 20>; ;; succ: 28 [95.5%] (TRUE_VALUE,EXECUTABLE) ;; 20 [4.5%] (FALSE_VALUE,EXECUTABLE) ;; basic block 28, loop depth 3, count 0, freq 9550, maybe hot ;; prev block 27, next block 29, flags: (NEW, REACHABLE) ;; pred: 27 [95.5%] (TRUE_VALUE,EXECUTABLE) # VUSE <.MEM_10> _91 = MEM[(struct .omp_data_s.5D.3564 &).omp_data_i_16(D) clique 1 base 1].busyD.3574; if (_91 != 0) goto <bb 29>; else goto <bb 20>; ;; succ: 29 [95.5%] (TRUE_VALUE,EXECUTABLE) ;; 20 [4.5%] (FALSE_VALUE,EXECUTABLE) ;; basic block 29, loop depth 3, count 0, freq 9120, maybe hot ;; prev block 28, next block 30, flags: (NEW, REACHABLE) ;; pred: 28 [95.5%] (TRUE_VALUE,EXECUTABLE) # .MEM_92 = VDEF <.MEM_10> - # USE = nonlocal unit-escaped null { D.3473 D.3474 D.3566 D.3567 D.3650 D.3653 D.3654 D.3656 D.3807 } (escaped) - # CLB = nonlocal unit-escaped null { D.3473 D.3474 D.3566 D.3567 D.3650 D.3653 D.3654 D.3656 D.3807 } (escaped) + # USE = nonlocal unit-escaped null { D.3473 D.3474 D.3656 } (escaped) + # CLB = nonlocal unit-escaped null { D.3473 D.3474 D.3656 } (escaped) busy_waitD.3439 (); goto <bb 27>; ;; succ: 27 [100.0%] (FALLTHRU,DFS_BACK,EXECUTABLE) ... AFICT, the new alias information is correct. And the optimization by fre looks reasonable: the variable busy does not seem to be modified by anything in the loop, so it removes the test on busy from the loop. But in fact, busy is a shared variable, so it can be modified by one of the other threads: ... int busy = 1; int num_threads; omp_init_lock (&lock); init_int_pair_stack (&global_stack); #pragma omp parallel firstprivate (array, count) { .... In other words, inside the parallel region, num_threads behaves as a volatile variable (it may return different values when read twice, while nothing seems to be writing to it inbetween the two reads). So AFAIU, we need to mark the uses of shared variables in openmp regions as volatile.