https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85653
Bug ID: 85653 Summary: [nvptx] Work around subsequent bar.sync JIT/ptxas bug Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- https://gcc.gnu.org/ml/gcc-patches/2018-04/msg01023.html : ... Hi, when compiling this testcase with the og7 branch: ... int main (void) { long long v1; #pragma acc parallel num_gangs (640) num_workers(1) vector_length (128) #pragma acc loop for (v1 = 0; v1 < 20; v1 += 2) ; return 0; } ... this ptx is generated: ... { // fork 4; bar.sync 0; // forked 4; // joining 4; bar.sync 0; // join 4; ret; } ... This triggers some bug on my quadro m1200 (I'm assuming in the ptxas/JIT compiler) that hangs the testcase. I can work around this by adding a membar.cta before the bar.syc, or two membar.ctas inbetween, but I'm not really sure what a minimal workaround should look like (I reported the bug to nvidia, I'm hoping for them to answer that question). ... Nvidia came back confirming the two membar.cta inbetween workaround. We'll want to implement this, even though this shouldn't be triggering on either og7 or trunk.