jlebar added inline comments.

================
Comment at: clang/lib/Headers/__clang_cuda_device_functions.h:32
+
+#define __DEVICE__ static __device__ __forceinline__
+// There are number of functions that became compiler builtins in CUDA-9 and 
are
----------------
tra wrote:
> jlebar wrote:
> > tra wrote:
> > > jlebar wrote:
> > > > I don't think we should need `__forceinline__`?  I'm sure they had that 
> > > > in their headers, but clang doesn't rely on its inliner for 
> > > > correctness, unlike some *other* compilers.  :)
> > > > 
> > > > I'm also not sure about `static`.  I think we mark all CUDA code as 
> > > > internal, so it shouldn't be necessary?  If they're not static, they 
> > > > should be marked as `inline`, though.
> > > The idea is to make those wrappers always inlined, so they are 
> > > effectively replaced with a call to __nv_* libdevice function. Whether 
> > > the callee itself get inlined is controlled by the attributes in the 
> > > libdevice bitcode.
> > > 
> > > `__attribute__((always_inline))` which `__forceinline__` expands to is 
> > > handled differently from regular inline. We have 
> > > Transforms/IPO/AlwaysInliner.cpp which handles them with -O1. Higher opt 
> > > levels will presumably inline these functions, too. Without always_inline 
> > > these wrappers will remain uninlined at -O1. Replacing it with just 
> > > `inline` will work OK (only affects -O1 in a minor way), but 
> > > `__forceinline__` is a bit closer to what I want from these wrappers.
> > > 
> > > As for the `static`, unused non-static functions will be emitted by clang 
> > > -- as far as clang is concerned they are externally visible. I've checked 
> > > and we do end up emitting them in PTX, even though they don't have 
> > > .visible directive (I think that's what you referred to when you 
> > > mentioned 'mark as internal'). I don't think we want that.
> > > 
> > > I'll keep this combination for now. We can adjust it later, if necessary.
> > Can you help me understand what makes you want this particular inlining 
> > behavior (i.e., always inline, even at -O0) for these wrappers?
> > 
> > Does e.g. libc++ do the same thing for the functions it has that are thin 
> > wrappers around e.g. libc functions?
> > 
> > > As for the static, unused non-static functions will be emitted by clang 
> > > -- as far as clang is concerned they are externally visible. I've checked 
> > > and we do end up emitting them in PTX, even though they don't have 
> > > .visible directive (I think that's what you referred to when you 
> > > mentioned 'mark as internal'). I don't think we want that.
> > 
> > Agree.  This is kind of unfortunate, though -- it's not How You Write 
> > Headers.  Could we add a comment?
> libc++ does seem to use _LIBCPP_ALWAYS_INLINE for a lot of wrapper-like 
> functions:
> 
> E.g. libcxx/include/__locale:
> ```
> 535:    _LIBCPP_ALWAYS_INLINE
> 536-    const char_type* tolower(char_type* __low, const char_type* __high) 
> const
> 537-    {
> 538-        return do_tolower(__low, __high);
> 539-    }
> ```
> 
> 
sgtm then!


https://reviews.llvm.org/D42513



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to