[PATCH] D60907: [OpenMP] Add math functions support in OpenMP offloading

Hal Finkel via Phabricator via cfe-commits Tue, 30 Apr 2019 09:44:05 -0700

hfinkel added a comment.

In D60907#1483660 <https://reviews.llvm.org/D60907#1483660>, @jdoerfert wrote:


> In D60907#1483615 <https://reviews.llvm.org/D60907#1483615>, @hfinkel wrote:
>
> > In D60907#1479370 <https://reviews.llvm.org/D60907#1479370>, @gtbercea 
> > wrote:
> >
> > > In D60907#1479142 <https://reviews.llvm.org/D60907#1479142>, @hfinkel 
> > > wrote:
> > >
> > > > In D60907#1479118 <https://reviews.llvm.org/D60907#1479118>, @gtbercea 
> > > > wrote:
> > > >
> > > > > Ping @hfinkel @tra
> > > >
> > > >
> > > > The last two comments in D47849 <https://reviews.llvm.org/D47849> 
> > > > indicated exploration of a different approach, and one which still 
> > > > seems superior to this one. Can you please comment on why you're now 
> > > > pursuing this approach instead?
> > >
> > >
> > > ...
> > >
> > > Hal, as far as I can tell, this solution is similar to yours but with a 
> > > slightly different implementation. If there are particular aspects about 
> > > this patch you would like to discuss/give feedback on please let me know.
> >
> >
> > The solution I suggested had the advantages of:
> >
> > 1. Being able to directly reuse the code in 
> > `__clang_cuda_device_functions.h`. On the other hand, using this solution 
> > we need to implement a wrapper function for every math function. When 
> > `__clang_cuda_device_functions.h` is updated, we need to update the OpenMP 
> > wrapper as well.
>
>
> I'd even go as far as to argue that `__clang_cuda_device_functions.h` should 
> include the internal math.h wrapper to get all math functions. See also the 
> next comment.
>
> > 2. Providing access to wrappers for other CUDA intrinsics in a natural way 
> > (e.g., rnorm3d) [it looks a bit nicer to provide a host version of rnorm3d 
> > than __nv_rnorm3d in user code].
>
> @hfinkel 
>  I don't see why you want to mix CUDA intrinsics with math.h overloads.


What I had in mind was matching non-standard functions in a standard way. For 
example, let's just say that I have a CUDA kernel that uses the rnorm3d 
function, or I otherwise have a function that I'd like to write in OpenMP that 
will make good use of this CUDA function (because it happens to have an 
efficient device implementation). This is a function that CUDA provides, in the 
global namespace, although it's not standard.

Then I can do something like this (depending on how we setup the 
implementation):

  double rnorm3d(double a,  double b, double c) {
    return sqrt(a*a + b*b + c*c);
  }
  
  ...
  
  #pragma omp target
  {
    double a = ..., b = ..., c = ...;
    double r = rnorm3d(a, b, c)
  }

and, if we use the CUDA math headers for CUDA math-function support, than this 
might "just work." To be clear, I can see an argument for having this work 
being a bad idea ;) -- but it has the advantage of providing a way to take 
advantage of system-specific functions while still writing completely-portable 
code.

>   I added a rough outline of how I imagined the internal math.h header to 
> look like as a comment in D47849. Could you elaborate how that differs from 
> what you imagine and how the other intrinsics come in?

That looks like what I had in mind (including `__clang_cuda_device_functions.h` 
to get the device functions.)

> 
> 
>> 3. Being similar to the "declare variant" functionality from OpenMP 5, and 
>> thus, I suspect, closer to the solution we'll eventually be able to apply in 
>> a standard way to all targets.
> 
> I can see this.
> 
>>> This solution is following Alexey's suggestions. This solution allows the 
>>> optimization of math calls if they apply (example: pow(x,2) => x*x ) which 
>>> was one of the issues in the previous solution I implemented.
>> 
>> So we're also missing that optimization for CUDA code when compiling with 
>> Clang? Isn't this also something that, regardless, should be fixed?
> 
> Maybe through a general built-in recognition and lowering into target 
> specific implementations/intrinsics late again?

I suspect that we need to match the intrinsics and perform the optimizations in 
LLVM at that level in order to get the optimizations for CUDA.


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60907/new/

https://reviews.llvm.org/D60907



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D60907: [OpenMP] Add math functions support in OpenMP offloading

Reply via email to