jlebar added inline comments.
================ Comment at: clang/lib/Headers/__clang_cuda_device_functions.h:32 + +#define __DEVICE__ static __device__ __forceinline__ +// There are number of functions that became compiler builtins in CUDA-9 and are ---------------- tra wrote: > jlebar wrote: > > tra wrote: > > > jlebar wrote: > > > > I don't think we should need `__forceinline__`? I'm sure they had that > > > > in their headers, but clang doesn't rely on its inliner for > > > > correctness, unlike some *other* compilers. :) > > > > > > > > I'm also not sure about `static`. I think we mark all CUDA code as > > > > internal, so it shouldn't be necessary? If they're not static, they > > > > should be marked as `inline`, though. > > > The idea is to make those wrappers always inlined, so they are > > > effectively replaced with a call to __nv_* libdevice function. Whether > > > the callee itself get inlined is controlled by the attributes in the > > > libdevice bitcode. > > > > > > `__attribute__((always_inline))` which `__forceinline__` expands to is > > > handled differently from regular inline. We have > > > Transforms/IPO/AlwaysInliner.cpp which handles them with -O1. Higher opt > > > levels will presumably inline these functions, too. Without always_inline > > > these wrappers will remain uninlined at -O1. Replacing it with just > > > `inline` will work OK (only affects -O1 in a minor way), but > > > `__forceinline__` is a bit closer to what I want from these wrappers. > > > > > > As for the `static`, unused non-static functions will be emitted by clang > > > -- as far as clang is concerned they are externally visible. I've checked > > > and we do end up emitting them in PTX, even though they don't have > > > .visible directive (I think that's what you referred to when you > > > mentioned 'mark as internal'). I don't think we want that. > > > > > > I'll keep this combination for now. We can adjust it later, if necessary. > > Can you help me understand what makes you want this particular inlining > > behavior (i.e., always inline, even at -O0) for these wrappers? > > > > Does e.g. libc++ do the same thing for the functions it has that are thin > > wrappers around e.g. libc functions? > > > > > As for the static, unused non-static functions will be emitted by clang > > > -- as far as clang is concerned they are externally visible. I've checked > > > and we do end up emitting them in PTX, even though they don't have > > > .visible directive (I think that's what you referred to when you > > > mentioned 'mark as internal'). I don't think we want that. > > > > Agree. This is kind of unfortunate, though -- it's not How You Write > > Headers. Could we add a comment? > libc++ does seem to use _LIBCPP_ALWAYS_INLINE for a lot of wrapper-like > functions: > > E.g. libcxx/include/__locale: > ``` > 535: _LIBCPP_ALWAYS_INLINE > 536- const char_type* tolower(char_type* __low, const char_type* __high) > const > 537- { > 538- return do_tolower(__low, __high); > 539- } > ``` > > sgtm then! https://reviews.llvm.org/D42513 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits