https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69625
Bug ID: 69625 Summary: deadlock in libgomp.c/doacross-1.c test Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libgomp Assignee: unassigned at gcc dot gnu.org Reporter: vogt at linux dot vnet.ibm.com CC: jakub at gcc dot gnu.org Target Milestone: --- Target: s390x Created attachment 37554 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37554&action=edit .s file of test program On s390x with -march=z196 -O2/-O3 the test hangs with a deadlock (and also doacross-[2.3].c and doacross-1.C, but I haven't looked at them yet). I've stripped down the test to this: -- snip -- #include <stdio.h> #define N 64 int b[N / 16][8][4]; int main () { int i, j, k, l; (void)l; #pragma omp parallel { printf("+++\n"); #pragma omp for schedule(static, 0) ordered (3) nowait for (i = 2; i < N / 16 - 1; i++) for (j = 0; j < 8; j += 2) for (k = 1; k <= 3; k++) { #pragma omp atomic write b[i][j][k] = 111111; #pragma omp ordered depend(sink: i, j - 2, k - 1) \ depend(sink: i - 2, j - 2, k + 1) #pragma omp ordered depend(sink: i - 3, j + 2, k - 2) if (j >= 2 && k > 1) { #pragma omp atomic read l = b[i][j - 2][k - 1]; } #pragma omp atomic write b[i][j][k] = 222222; if (i >= 4 && j >= 2 && k < 3) { #pragma omp atomic read l = b[i - 2][j - 2][k + 1]; } #pragma omp ordered depend(source) #pragma omp atomic write b[i][j][k] = 333333; } printf("---\n"); } printf("done\n"); return 0; } -- snip -- (See attachment for full .s file.) (Running on an LPAR with 17 cores inside gdb.) The function GOMP_parallel starts threads 2 to 17 which enter and leave the parallel region (they print both "+++" and "---" then hang in a team_barrier_wait_final() call in gomp_thread_start. Only then thread 1 runs the thread function. gomp_team_start (fn, data, num_threads, flags, gomp_new_team (num_threads)); fn (data); Thread 1 comes across 0x0000000080000b7a <+522>: brasl %r14,0x800007b0 <GOMP_doacross_wait@plt> with %r10 == 2 (which presumably contains k), then continues through 0x0000000080000cf6 <+902>: brasl %r14,0x800006f0 <GOMP_doacross_post@plt> and finally comes back to 0x0000000080000b7a <+522>: brasl %r14,0x800007b0 <GOMP_doacross_wait@plt> with %r10 == 3. In GOMP_doacross_wait() it ends up calling doacross_spin() and never gets out of that again: doacross_spin (array, flattened, cur); 0x000003fff7ef5562 <+282>: lg %r1,0(%r5) 0x000003fff7ef5568 <+288>: clgr %r1,%r2 0x000003fff7ef556c <+292>: jle 0x3fff7ef5562 <GOMP_doacross_wait+282> The value of r1 (= *r5 (= *array?)) remains 6 (since there's no other thread left that could modify it) while the value of r2 is 0xfffffffb4a1. To me this looks as if doacross_spin() compares an integer value with an address or rubbish. Any ideas what's going on?