https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85653

            Bug ID: 85653
           Summary: [nvptx] Work around subsequent bar.sync JIT/ptxas bug
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

https://gcc.gnu.org/ml/gcc-patches/2018-04/msg01023.html :
...
Hi,

when compiling this testcase with the og7 branch:
...
int
main (void)
{
  long long v1;
#pragma acc parallel num_gangs (640) num_workers(1) vector_length (128)
#pragma acc loop
  for (v1 = 0; v1 < 20; v1 += 2)
    ;

  return 0;
}
...

this ptx is generated:
...
{

  // fork 4;
  bar.sync 0;
  // forked 4;
  // joining 4;
  bar.sync 0;
  // join 4;
  ret;
}
...

This triggers some bug on my quadro m1200 (I'm assuming in the ptxas/JIT
compiler) that hangs the testcase. I can work around this by adding a
membar.cta before the bar.syc, or two membar.ctas inbetween, but I'm not really
sure what a minimal workaround should look like (I reported the bug to nvidia,
I'm hoping for them to answer that question). 
...

Nvidia came back confirming the two membar.cta inbetween workaround.

We'll want to implement this, even though this shouldn't be triggering on
either og7 or trunk.

Reply via email to