hfinkel added a comment.

We need to make progress on this, and I'd like to suggest a path forward...

First, we have a fundamental problem here: Using host headers to declare 
functions for the device execution environment isn't sound. Those host headers 
can do anything, and while some platforms might provide a way to make the host 
headers more friendly (e.g., by defining __NO_MATH_INLINES), these mechanisms 
are neither robust nor portable. Thus, we should not rely on host headers to 
define functions that might be available on the device. However, even when 
compiling for the device, code meant only for host execution must be 
semantically analyzable. This, in general, requires the host headers. So we 
have a situation in which we must both use the host headers during device 
compilation (to keep the semantic analysis of the surrounding host code 
working) and also can't use the host headers to provide definitions for use for 
device code (e.g., because those host headers might provide definitions relying 
on host inline asm, intrinsics, using types not lowerable in device code, could 
provide declarations using linkage-affecting attributes not lowerable for the 
device, etc.).

This is, or is very similar to, the problem that the host/device overloading 
addresses in CUDA. It is also the problem, or very similar to the problem, that 
the new OpenMP 5 `declare variant` directive is intended to address. Johannes 
and I discussed this earlier today, and I suggest that we:

1. Add a math.h wrapper to clang/lib/Headers, which generally just does an 
include_next of math.h, but provides us with the ability to customize this 
behavior. Writing a header for OpenMP on NVIDIA GPUs which is essentially 
identical to the math.h functions in __clang_cuda_device_functions.h would be 
unfortunate, and as CUDA does provide the underlying execution environment for 
OpenMP target offload on NVIDIA GPUs, duplicative even in principle. We don't 
need to alter the default global namespace, however, but can include this file 
from the wrapper math.h.
2. We should allow host/device overloading in OpenMP mode. As an extension, we 
could directly reuse the CUDA host/device overloading capability - this also 
has the advantage of allowing us to directly reuse 
__clang_cuda_device_functions.h (and perhaps do a similar thing to pick up the 
device-side printf, etc. from __clang_cuda_runtime_wrapper.h). In the future, 
we can extend these to provide overloading using OpenMP declare variant, if 
desired, when in OpenMP mode.

Thoughts?


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D47849/new/

https://reviews.llvm.org/D47849



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
  • [PATCH] D47849: [Op... Hal Finkel via Phabricator via cfe-commits

Reply via email to