jdoerfert added a comment.

In D79744#2047482 <https://reviews.llvm.org/D79744#2047482>, @arsenm wrote:

> For the purpose here, only the callee exists. This is essentially a 
> freestanding function, the entry point to the program. There is no caller 
> function, and in the future I would like to make a call to amdgpu_kernel an 
> IR verifier error (technically OpenCL device enqueue is an exception to this, 
> but we don't treat this as a call. Instead there's a lot of library magic to 
> invoke the kernel. From the perspective of the callee nothing changes, since 
> it's still not allowed to modify the incoming argument buffer or aware it was 
> called this way).


Did you consider callback annotation for the device enqueue call? While that 
might not change anything *now*, I'm expecting interesting optimization 
opportunities there at some point "soon".

> The load-from-constant nature needs to be exposed earlier, so I think this 
> necessarily involves changing the convention lowering in some way and it's 
> just a question of what it looks like. To summarize the 2.5 options I've come 
> up with are
> 
> 1. Use constant byval parameters, as this patch does. This requires the least 
> implementation effort but doesn't exactly fit in with byval as defined.

And, as was noted in the `byval` lang ref patch (D79636 
<https://reviews.llvm.org/D79636>), there is a reasonable argument to be made 
to phase-out `byval` in favor of some explicit copying. If that happens, this 
solution should not be "the last `byval` user". Also, `byval` arguments could 
be used as scratchpad by smart optimizations. I somehow don't believe this is a 
great choice but I can by now see that the others are neither.

> 1. Replace all IR argument uses with loads from a constant offset from an 
> intrinsic call. This still needs to leave the IR arguments in place because 
> we do need to know the original argument sizes and offsets, but they would 
> never have a use (or I would need to invent some other method of tracking 
> this information)
> 2. Keep clang IR generation unchanged, but move the pass that lowers 
> arguments to loads earlier and hack out aggregate IR loads before SROA makes 
> things worse. This is really just a kludgier version of option 2. We do 
> ultimately do this late in the backend to enable vectorization, but it does 
> seem to make the middle end optimizer unhappy




CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79744/new/

https://reviews.llvm.org/D79744



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to