On 10/20/2015 08:34 PM, Alexander Monakov wrote:
On NVPTX, there's 16 hardware barriers for each thread team, each barrier has
a variable waiter count.  The instruction 'bar.sync N, M;' allows to wait on
barrier number N until M threads have arrived.  M should be pre-multiplied by
warp width.  It's also possible to 'post' the barrier without suspending with
'bar.arrive'.

We should be able to provide gomp barrier via a combination of ptx barriers
and atomics.  This patch is a first step in that direction.

It's mostly a copy of Linux implementation, and it's very likely that
functions more complex than gomp_barrier_wait_end are implemented incorrectly.
I will have to review all of that (and optimize, hopefully).

I'm not sure if naked asm()'s are OK.  It's possible to implement a builtin
instead for a minor beautification.  Thoughts?

I have no concerns about naked asms. I'm more concerned about whether this actually works - how much testing has this had? My experience has been that there is practically no way of using bar.sync reliably, since we can't control warp divergence and reconvergence at the ptx level but the hardware bar.sync instruction only works when executed by all threads in a warp at the same time.


Bernd

Reply via email to