Re: [gomp4 09/14] libgomp: provide barriers on NVPTX

Bernd Schmidt Tue, 20 Oct 2015 13:56:48 -0700

On 10/20/2015 08:34 PM, Alexander Monakov wrote:

On NVPTX, there's 16 hardware barriers for each thread team, each barrier has
a variable waiter count.  The instruction 'bar.sync N, M;' allows to wait on
barrier number N until M threads have arrived.  M should be pre-multiplied by
warp width.  It's also possible to 'post' the barrier without suspending with
'bar.arrive'.


We should be able to provide gomp barrier via a combination of ptx barriers
and atomics.  This patch is a first step in that direction.

It's mostly a copy of Linux implementation, and it's very likely that
functions more complex than gomp_barrier_wait_end are implemented incorrectly.
I will have to review all of that (and optimize, hopefully).

I'm not sure if naked asm()'s are OK.  It's possible to implement a builtin
instead for a minor beautification.  Thoughts?

I have no concerns about naked asms. I'm more concerned about whetherthis actually works - how much testing has this had? My experience hasbeen that there is practically no way of using bar.sync reliably, sincewe can't control warp divergence and reconvergence at the ptx level butthe hardware bar.sync instruction only works when executed by allthreads in a warp at the same time.



Bernd

Re: [gomp4 09/14] libgomp: provide barriers on NVPTX

Reply via email to