On Tue, Oct 20, 2015 at 09:34:22PM +0300, Alexander Monakov wrote: > I've opted not to use dynamic parallelism. It increases the hardware > requirement from sm_30 to sm_35, needs a library from CUDA Toolkit at link
I'll try to add the thread_limit/num_teams arguments to GOMP_target_41 soon (together with the target teams clause evaluation changes), so sometimes you'll have that information at target time, but not always. Using teams/thread preallocation when possible is fine with me, but I think it is not always possible, if you can't see what teams will require for number of teams or what thread_limit will it want, or if thread_limit is unspecified and you have no idea how many threads will be requested... I think requiring sm_35 should not be a very big deal. > time (libcudadevrt.a), and imposes overhead at run time. The last point might But if this is the case, that is really serious issue. Is that really something that isn't available in a shared library? E.g. with my distro GCC maintainer hat on, I'd really like to tweak the libgomp PTX plugin, so that it compiles against a stub cuda.h header and doesn't like against libcuda*.so at all, but instead dlopens it, to avoid hard dependencies on the non-free CUDA stuff and more importantly any link time dependencies on that. If libcudadevrt is not available as shared library, this wouldn't of course work. Would be nice to talk to NVidia about this... > libgomp.c/thread-limit-2.c: fails to link due to 'usleep' unavailable on > NVPTX. Note, the test does not run anything on the device because the target > region has 'if (0)' clause. As optimization, perhaps we could avoid adding the "omp target entrypoint" attribute for the body of if(0) target region, that one always goes to host fallback, so no offloaded code is needed. As for other tests, XFAILing them always is undesirable, supposedly we could add a dejagnu target check whether the default target goes to PTX (if we don't have it already) and use that to xfail? Of course that doesn't help the thread-limit-2.c testcase. Jakub