yaxunl marked 2 inline comments as done.
yaxunl added inline comments.

================
Comment at: clang/test/CodeGenCUDA/lambda.cu:16-26
+template<class F>
+__global__ void g(F f) { f(); }
+
+template<class F>
+void h(F f) { g<<<1,1>>>(f); }
+
+__device__ int a;
----------------
tra wrote:
> yaxunl wrote:
> > tra wrote:
> > > The test example may not be doing what it's seemingly supposed to be 
> > > doing:
> > > https://cppinsights.io/s/3a5c42ff
> > > 
> > > `h()` gets a temporary host-side object which keeps the reference to `a` 
> > > and that reference will actually point to the host-side shadow of the 
> > > actual device-side `a`. When you get to execute `g` it's `this` may not 
> > > be very usable on device side and thus `f.operator()` will probably not 
> > > work.
> > > 
> > > Alas, we currently have no diagnostics for that kind of error.
> > > 
> > > Change it to a non-capturing lambda, perhaps?
> > It works.
> > 
> > We need to think about this in device compilation. In device compilation, 
> > global variable is a device variable, the lambda is a device host function, 
> > therefore the lambda is accessing the real a, not the shadow.
> > 
> > In the host compilation, the lambda is not really called, therefore it is 
> > not emitted.
> > 
> > I will update the lit test with these checks.
> Clang manages to see through to the initializer of `a`, but I'm not sure how 
> much we can rely on this.
> In general, `f.operator()` for a capturing lambda needs to access captured 
> variables via `this` which points to a temporary objects created and passed 
> to `g` by the host. You can see it if you capture a local variable: 
> https://godbolt.org/z/99389o
> 
> Anyways, it's an issue orthogonal to this patch. My concern is that tests are 
> often used as an example of things that are OK to do, and capturing lambdas 
> are a pretty big foot-shooting gun when used with CUDA. It's very easy to do 
> wrong thing without compiler complaining about them.
> While accessing `a` does work, it appears to do so by accident, rather than 
> by design. 
> 
> I'm fairly confident that I can hide the initializer with sufficiently 
> complicated code, force clang to access `a` via `this` and make everything 
> fail at runtime. IMO, what we have here is a 'happens to work' situation. I 
> do not want to call it 'guaranteed to work' without making sure that it 
> always does.
> 
> In order to demonstrate that lambda is host/device, you do not need it to be 
> a capturing lambda. You can make it call an overloaded function with host and 
> device variants and verify that the lambda works on host and device sides.
> 
> 
I added one more test, where a lambda function calls a template function which 
is overloaded with a host version and a device version. The lambda is called in 
both host function and in kernel. Test shows correct version of template 
function are emitted in host and device compilation.

I think it is not a surprise that the lambda function is able to resolve the 
host/device-ness of the callee correctly. We are doing resolution in a host 
device function and the two candidates are same-side vs wrong-side.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78655/new/

https://reviews.llvm.org/D78655



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to