Anastasia added a comment.

I think we should add a test case when the same block is both called and 
enqueued.



================
Comment at: lib/CodeGen/CGOpenCLRuntime.cpp:113
+
+llvm::Value *CGOpenCLRuntime::emitOpenCLEnqueuedBlock(CodeGenFunction &CGF,
+                                                      const Expr *E) {
----------------
yaxunl wrote:
> Anastasia wrote:
> > yaxunl wrote:
> > > Anastasia wrote:
> > > > I am not particularly in favour of duplicating CodeGen functionality as 
> > > > it typically has so many special cases that are hard to catch. Is this 
> > > > logic needed in order to pass to block literal information  that the 
> > > > block is enqueued?
> > > This code is needed to emit separate functions for a block which is 
> > > directly called and also enqueued as a kernel. Since the kernel needs to 
> > > have proper calling convention and ABI, it cannot be emitted as the same 
> > > function as when the block is called directly. Since it is OpenCL 
> > > specific code, I found it is cleaner to separate this code as member of 
> > > CGOpenCLRuntime instead of fitting it into CGF.EmitBlockLiteral.
> > This part is replacing standard `EmitScalarExpr` call which is doing 
> > several things before calling into block generation. That's why I am a bit 
> > worried we are covering all the corner cases here.
> > 
> > So if we transform all blocks into kernels unconditionally we won't need 
> > this special handling then?
> > 
> > Do we generate two versions of the blocks now: one for enqueue and one for 
> > call?
> If we transform all blocks into kernels, we could simplify the logic. 
> Probably will not need this special handling.
> 
> However, when the block is called directly, the first argument is a private 
> pointer, when it is executed as a kernel, the first argument is a global 
> pointer or a struct (for amdgpu target), therefore the emitted functions 
> cannot be the same.
Would using generic address space for this first argument not work?


================
Comment at: test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl:3
+
+// CHECK: %[[S1:struct.__amdgpu_block_arg_t.*]] = type { [3 x i64], [1 x i8] }
+// CHECK: %[[S2:struct.__amdgpu_block_arg_t.*]] = type { [5 x i64], [1 x i8] }
----------------
yaxunl wrote:
> Anastasia wrote:
> > This struct is not identical to block literal struct?
> The LLVM type of the first argument of block invoke function is created 
> directly with sorting and rearrangement. There is no AST type corresponding 
> to it. However, the function codegen requires AST type of this argument. I 
> feel it is unnecessary to create the corresponding AST type. For simplicity, 
> just create an AST type with the same size and alignment as the LLVM type. In 
> the function code, it will be bitcasted to the correct LLVM struct type and 
> get the captured variables.
So `void ptr` won't be possible here? Since it is cast to a right struct inside 
the block anyway. Once again a block is a special type object with known 
semantic to compiler and runtime in contract to kernels that can be written 
with any arbitrary type of arguments.

I just don't like the idea of duplicating the block invoke function in case 
it's being both called and enqueued. Also the login in blocks code generation 
becomes difficult to understand. So I am wondering if we could perhaps create a 
separate kernel function (as a wrapper) for enqueue_kernel which would call a 
block instead. What do you think about it? I think the kernel prototype would 
be fairly generic as it would just have a block call inside and pass the 
arguments into it... We won't need to modify block generation then at all. 


https://reviews.llvm.org/D38134



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to