[PATCH] D50845: [CUDA/OpenMP] Define only some host macros during device compilation

Jonas Hahnfeld via Phabricator via cfe-commits Thu, 16 Aug 2018 11:53:25 -0700

Hahnfeld added a comment.

In https://reviews.llvm.org/D50845#1202838, @tra wrote:

> In https://reviews.llvm.org/D50845#1202551, @ABataev wrote:
>
> > In https://reviews.llvm.org/D50845#1202550, @Hahnfeld wrote:
> >
> > > In https://reviews.llvm.org/D50845#1202540, @ABataev wrote:
> > >
> > > > Maybe for device compilation we also should define `__NO_MATH_INLINES` 
> > > > and `__NO_STRING_INLINES` macros to disable inline assembly in glibc?
> > >
> > >
> > > The problem is that `__NO_MATH_INLINES` doesn't even avoid all inline 
> > > assembly from `bits/mathinline.h` :-( incidentally Clang already defines 
> > > `__NO_MATH_INLINES` for x86 (due to an old bug which has been fixed long 
> > > ago) - and on CentOS we still have problems as described in PR38464.
> > >
> > > As a second thought: This might be valid for NVPTX, but I don't think 
> > > it's a good idea for x86-like offloading targets - they might well profit 
> > > from inline assembly code.
> >
> >
> > I'm not saying that we should define those macros for all targets, only for 
> > NVPTX. But still, it may disable some inline assembly for other 
> > architectures.
>
>
> IMO, trying to avoid inline assembly by defining(or not) some macros and 
> hoping for the best is rather fragile as we'll have to chase *all* patches 
> that host's math.h may have on any given system.

Completely agree here: This patch tries to pick the low-hanging fruits that 
happen to fix `include <math.h>` on most systems (and addressing a 
long-standing `FIXME` in the code). I know there are more headers that define 
inline assembly unconditionally and need more advanced fixes (see below).

> If I understand it correctly, the root cause of this exercise is that we want 
> to compile for GPU using plain C. CUDA avoids this issue by separating device 
> and host code via target attributes and clang has few special cases to ignore 
> inline assembly errors in the host code if we're compiling for device. For 
> OpenMP there's no such separation, not in the system headers, at least.

Yes, that's one of the nice properties of CUDA (for the compiler). There used 
to be the same restriction for OpenMP where all functions used in `target` 
regions needed to be put in `declare target`. However that was relaxed in favor 
of implicitly marking all **called** functions in that TU to be `declare 
target`.
So ideally I think Clang should determine which functions are really `declare 
target` (either explicit or implicit) and only run semantical analysis on them. 
If a function is then found to be "broken" it's perfectly desirable to error 
back to the user.

Repository:
  rC Clang

https://reviews.llvm.org/D50845

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D50845: [CUDA/OpenMP] Define only some host macros during device compilation

Reply via email to