[ was: Re: [Patch] [middle-end & nvptx] gcc/tracer.c: Don't split BB
with SIMT LANE [PR95654] ]

On 9/16/20 8:20 PM, Alexander Monakov wrote:
> 
> 
> On Wed, 16 Sep 2020, Tom de Vries wrote:
> 
>> [ cc-ing author omp support for nvptx. ]
> 
> The issue looks familiar. I recognized it back in 2017 (and LLVM people
> recognized it too for their GPU targets). In an attempt to get agreement
> to fix the issue "properly" for GCC I found a similar issue that affects
> all targets, not just offloading, and filed it as PR 80053.
> 
> (yes, there are no addressable labels involved in offloading, but nevertheless
> the nature of the middle-end issue is related)

Hi Alexander,

thanks for looking into this.

Seeing that the attempt to fix things properly is stalled, for now I'm
proposing a point-fix, similar to the original patch proposed by Tobias.

Richi, Jakub, OK for trunk?

Thanks,
- Tom

[omp, ftracer] Don't duplicate blocks in SIMT region

When running the libgomp testsuite on x86_64-linux with nvptx accelerator,
we run into:
...
FAIL: libgomp.fortran/pr66199-5.f90   -O3 -fomit-frame-pointer -funroll-loops \
  -fpeel-loops -ftracer -finline-functions  execution test
...

The problem is that ftracer duplicates a block containing GOMP_SIMT_VOTE_ANY.

That is, before ftracer we have (dropping the GOMP_SIMT_ prefix):
...
bb4(ENTER_ALLOC)
*----------+
|           \
|            \
|             v
|             *
v             bb8
*<------------*
bb5(VOTE_ANY)
*-------------+
|             |
|             |
|             |
|             |
|             v
|             *
v             bb7(XCHG_IDX)
*<------------*
bb6(EXIT)
...

The XCHG_IDX internal-fn does inter-SIMT-lane communication, which for nvptx
maps onto shfl, an operator which has the requirement that the warp executing
the operator is convergent.  The warp diverges at bb4, and
reconverges at bb5, and does not diverge by going to bb7, so the shfl is
indeed executed by a convergent warp.

After ftracer, we have:
...
bb4(ENTER_ALLOC)
*----------+
|           \
|            \
|             \
|              \
v               v
*               *
bb5(VOTE_ANY)   bb8(VOTE_ANY)
*               *
|\             /|
| \  +--------+ |
|  \/           |
|  /\           |
| /  +----------v
|/              *
v               bb7(XCHG_IDX)
*<--------------*
bb6(EXIT)
...

The warp diverges again at bb5, but does not reconverge again before bb6, so
the shfl is executed by a divergent warp, which causes the FAIL.

Fix this by making ftracer ignore blocks containing ENTER_ALLOC and EXIT,
effectively treating the SIMT region conservatively.

One could argue that the EXIT and VOTE_ANY can be generated by omp-low in
reverse order, in which case the VOTE_ANY could be duplicated.  This is the
reason VOTE_ANY is not explicitly listed as ignored in this patch.

An argument can also be made that the test needs to be added in a more
generic place, like gimple_can_duplicate_bb_p or some such, and that ftracer
then needs to use the generic test.  But that's a discussion with a much
broader scope, so I'm leaving that for another patch.

Build on x86_64-linux with nvptx accelerator, tested with libgomp.

gcc/ChangeLog:

	PR fortran/95654
	* tracer.c (ignore_bb_p): Ignore GOMP_SIMT_ENTER_ALLOC
	and GOMP_SIMT_EXIT.

---
 gcc/tracer.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/gcc/tracer.c b/gcc/tracer.c
index 82ede722534..de80416f163 100644
--- a/gcc/tracer.c
+++ b/gcc/tracer.c
@@ -108,6 +108,16 @@ ignore_bb_p (const_basic_block bb)
 	return true;
     }
 
+  for (gimple_stmt_iterator gsi = gsi_start_bb (CONST_CAST_BB (bb));
+       !gsi_end_p (gsi); gsi_next (&gsi))
+    {
+      gimple *g = gsi_stmt (gsi);
+      if (is_gimple_call (g)
+	  && (gimple_call_internal_p (g, IFN_GOMP_SIMT_ENTER_ALLOC)
+	      || gimple_call_internal_p (g, IFN_GOMP_SIMT_EXIT)))
+	return true;
+    }
+
   return false;
 }
 

Reply via email to