[PATCH] D123471: [CUDA] Create offloading entries when using the new driver

Artem Belevich via Phabricator via cfe-commits Mon, 09 May 2022 13:11:32 -0700

tra added inline comments.


================
Comment at: clang/lib/CodeGen/CGCUDARuntime.h:58
+      /// Mark the entry as a global variable.
+      OffloadGlobalEntry = 0x0,
+      /// Mark the entry as a managed global variable.
----------------
jhuber6 wrote:
> tra wrote:
> > jhuber6 wrote:
> > > tra wrote:
> > > > jhuber6 wrote:
> > > > > tra wrote:
> > > > > > We're still using the same numeric value for two different kinds of 
> > > > > > entities.
> > > > > > Considering that it's the third round we're making around this 
> > > > > > point, I'm starting to suspect that I may be missing something.
> > > > > > Is there a particular reason kernels and global unmanaged variables 
> > > > > > have to have the same 'kind'?
> > > > > > 
> > > > > > It's possible that I didn't do a good job explaining my 
> > > > > > enthusiastic nitpicking here.
> > > > > > My suggestion to have unified enum for **all** entities we register 
> > > > > > is based on a principle of separation of responsibilities. If we 
> > > > > > want to know what kind of entry we're dealing with, checking the 
> > > > > > 'kind' field should be sufficient. The 'size' field should only 
> > > > > > indicate the size of the entity. Having to consider both kind and 
> > > > > > size to determine what you're dealing with just muddies things and 
> > > > > > should not be done unless there's a good reason for that. E.g. it 
> > > > > > might be OK if we were short on flag bits.
> > > > > > 
> > > > > > 
> > > > > Ah, I see the point you're making now. This is yet another thing that 
> > > > > OpenMP did that I just copied. I wouldn't have implemented it this 
> > > > > way but I figured it would be simpler to keep them similar. I mostly 
> > > > > did it this way because I did some initial tests of registering and 
> > > > > accessing CUDA globals in OpenMP and it required using the same flags 
> > > > > for the kernels and globals. We could change it for CUDA in the 
> > > > > future and I could make that change here if it's valuable. Ideally I 
> > > > > would like to rewrite how we do all this registration with the 
> > > > > structs but breaking the ABI makes it complicated...
> > > > >  I did some initial tests of registering and accessing CUDA globals 
> > > > > in OpenMP and it required using the same flags for the kernels and 
> > > > > globals.
> > > > 
> > > > OK. So, there is something that requires this magic. If that's 
> > > > something we must have, then it must be mentioned in the comments 
> > > > around the enum.
> > > > 
> > > > Do you know where I should find the code which needs this? I'm curious 
> > > > what's going on there.
> > > > I wonder if it just checks for "flags==0" and refuses to deal with 
> > > > unknown flags. 
> > > > To think of it, we probably want to put the enum into a common header 
> > > > which defines the `__tgt_offload_entry`.We would not want OpenMP itself 
> > > > to start using the same bits for something else.
> > > Sorry, I should be more specific. The OpenMP offloading runtime currently 
> > > uses a size of zero to indicate a kernel function and the flags have a 
> > > different meaning if it's a kernel. For OpenMP, 0 is a kernel, 1 and 2 
> > > are device ctors / dtors. I'm not sure why they chose this over just 
> > > another flag but it's the current standard. You can see it used like this 
> > > here 
> > > https://github.com/llvm/llvm-project/blob/main/openmp/libomptarget/src/omptarget.cpp#L147.
> > >  I'm not sure if there's a good way to wrangle these together now that I 
> > > think about it, considering OpenMP already uses `0x1` to represent `link` 
> > > OpenMP variables so this already collides. But treating the flags 
> > > different on the size is at least consistent with what OpenMP does. It 
> > > makes it a little hard to define one enum for it since we use it two 
> > > different ways, I'm not a fan of it but it's what the current ABI uses.
> > I see.
> > 
> > Using `size=0` as the coda/data flag which changes interpretation of the 
> > flags sort of makes sense. In that case two different types for the flags 
> > field would be appropriate, with an appropriate comment describing that 
> > `size==0` determines which one is in effect.
> > 
> Personally I'm find with it landing like this, and if we wanted to improve 
> this later it would probably just go in some greater ABI break for offloading 
> entries. There might be a good reason to change them all at once when we 
> start focusing more on complete interoperability of offloading languages.
I'm fine with that. Just add a comment describing how `OffloadGlobalEntry` is 
used for both code and data and that the size is used to distinguish them.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123471/new/

https://reviews.llvm.org/D123471

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D123471: [CUDA] Create offloading entries when using the new driver

Reply via email to