On 12/02/15 12:09, Alexander Monakov wrote:
I meant the PTX linked (post PTX-JIT link) image, so regardless of support, it's not an issue. E.g. check early in gomp_nvptx_main if .weak __nvptx_has_simd != 0. It would only break if there was dlopen on PTX.
Note I found a bug in .weak support. See the comment in gcc.dg/special/weak-2.c /* NVPTX's implementation of weak is broken when a strong symbol is in a later object file than the weak definition. */
That's not enough: you have to reach the SIMD region entry in threads 1-31, which means they need to execute all preceding control flow like thread 0, which means they need to compute controlling predicates like thread 0. (OpenACC broadcasts controlling predicates at branches)
indeed. Hence the partial 'forking' before a function call of a function with internal partitioned execution.
nathan