jlebar added inline comments.
================ Comment at: clang/lib/Headers/__clang_cuda_device_functions.h:32 + +#define __DEVICE__ static __device__ __forceinline__ +// There are number of functions that became compiler builtins in CUDA-9 and are ---------------- tra wrote: > jlebar wrote: > > I don't think we should need `__forceinline__`? I'm sure they had that in > > their headers, but clang doesn't rely on its inliner for correctness, > > unlike some *other* compilers. :) > > > > I'm also not sure about `static`. I think we mark all CUDA code as > > internal, so it shouldn't be necessary? If they're not static, they should > > be marked as `inline`, though. > The idea is to make those wrappers always inlined, so they are effectively > replaced with a call to __nv_* libdevice function. Whether the callee itself > get inlined is controlled by the attributes in the libdevice bitcode. > > `__attribute__((always_inline))` which `__forceinline__` expands to is > handled differently from regular inline. We have > Transforms/IPO/AlwaysInliner.cpp which handles them with -O1. Higher opt > levels will presumably inline these functions, too. Without always_inline > these wrappers will remain uninlined at -O1. Replacing it with just `inline` > will work OK (only affects -O1 in a minor way), but `__forceinline__` is a > bit closer to what I want from these wrappers. > > As for the `static`, unused non-static functions will be emitted by clang -- > as far as clang is concerned they are externally visible. I've checked and we > do end up emitting them in PTX, even though they don't have .visible > directive (I think that's what you referred to when you mentioned 'mark as > internal'). I don't think we want that. > > I'll keep this combination for now. We can adjust it later, if necessary. Can you help me understand what makes you want this particular inlining behavior (i.e., always inline, even at -O0) for these wrappers? Does e.g. libc++ do the same thing for the functions it has that are thin wrappers around e.g. libc functions? > As for the static, unused non-static functions will be emitted by clang -- as > far as clang is concerned they are externally visible. I've checked and we do > end up emitting them in PTX, even though they don't have .visible directive > (I think that's what you referred to when you mentioned 'mark as internal'). > I don't think we want that. Agree. This is kind of unfortunate, though -- it's not How You Write Headers. Could we add a comment? ================ Comment at: clang/lib/Headers/__clang_cuda_runtime_wrapper.h:144 +// Declare or define device-side functions that particular CUDA version relies +// on but does (no longer) declare in its headers. E.g. some of the device-side +// functions that used to be implemented in a header in CUDA-8, became NVCC's ---------------- tra wrote: > jlebar wrote: > > Sentence doesn't read right if you skip the parens. > > > > But also: Our header provides *everything* from > > cuda/device_functions_decls.h, not just the things that were once in that > > header and, in newer CUDA versions, are no longer there, right? I'm > > actually even more confused now that I read the big header, because here we > > say the header "defines or declares" these functions, but that suggests > > that for every function, it decides whether to define or to declare it, but > > that's not the case... > I should rephrase that then. > > `__clang_cuda_device_functions.h` does two jobs. It always includes > `__clang_cuda_libdevice_declares.h` which provides **declarations** for the > libdevice functions. Those are gone in CUDA-9.1. > > It also provides **definitions** for the standard library and __* device-side > functions that call libdevice. > > I can explicitly include _clang_cuda_libdevice_declares.h here and update > the comment to make more sense. I think that would help clarify things, for sure. https://reviews.llvm.org/D42513 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits