On Tue, Oct 20, 2015 at 09:34:22PM +0300, Alexander Monakov wrote:
> I've opted not to use dynamic parallelism.  It increases the hardware
> requirement from sm_30 to sm_35, needs a library from CUDA Toolkit at link

I'll try to add the thread_limit/num_teams arguments to GOMP_target_41
soon (together with the target teams clause evaluation changes), so
sometimes you'll have that information at target time, but not always.
Using teams/thread preallocation when possible is fine with me, but I think
it is not always possible, if you can't see what teams will require for
number of teams or what thread_limit will it want, or if thread_limit is
unspecified and you have no idea how many threads will be requested...
I think requiring sm_35 should not be a very big deal.

> time (libcudadevrt.a), and imposes overhead at run time.  The last point might

But if this is the case, that is really serious issue.  Is that really
something that isn't available in a shared library?
E.g. with my distro GCC maintainer hat on, I'd really like to tweak the
libgomp PTX plugin, so that it compiles against a stub cuda.h header and
doesn't like against libcuda*.so at all, but instead dlopens it, to avoid
hard dependencies on the non-free CUDA stuff and more importantly any link
time dependencies on that.  If libcudadevrt is not
available as shared library, this wouldn't of course work.  Would be nice to
talk to NVidia about this...

> libgomp.c/thread-limit-2.c: fails to link due to 'usleep' unavailable on
> NVPTX.  Note, the test does not run anything on the device because the target
> region has 'if (0)' clause.

As optimization, perhaps we could avoid adding the "omp target entrypoint"
attribute for the body of if(0) target region, that one always goes to host
fallback, so no offloaded code is needed.

As for other tests, XFAILing them always is undesirable, supposedly we could
add a dejagnu target check whether the default target goes to PTX (if we
don't have it already) and use that to xfail?  Of course that doesn't help
the thread-limit-2.c testcase.

        Jakub

Reply via email to