jdoerfert added a comment. In D79744#2047482 <https://reviews.llvm.org/D79744#2047482>, @arsenm wrote:
> For the purpose here, only the callee exists. This is essentially a > freestanding function, the entry point to the program. There is no caller > function, and in the future I would like to make a call to amdgpu_kernel an > IR verifier error (technically OpenCL device enqueue is an exception to this, > but we don't treat this as a call. Instead there's a lot of library magic to > invoke the kernel. From the perspective of the callee nothing changes, since > it's still not allowed to modify the incoming argument buffer or aware it was > called this way). Did you consider callback annotation for the device enqueue call? While that might not change anything *now*, I'm expecting interesting optimization opportunities there at some point "soon". > The load-from-constant nature needs to be exposed earlier, so I think this > necessarily involves changing the convention lowering in some way and it's > just a question of what it looks like. To summarize the 2.5 options I've come > up with are > > 1. Use constant byval parameters, as this patch does. This requires the least > implementation effort but doesn't exactly fit in with byval as defined. And, as was noted in the `byval` lang ref patch (D79636 <https://reviews.llvm.org/D79636>), there is a reasonable argument to be made to phase-out `byval` in favor of some explicit copying. If that happens, this solution should not be "the last `byval` user". Also, `byval` arguments could be used as scratchpad by smart optimizations. I somehow don't believe this is a great choice but I can by now see that the others are neither. > 1. Replace all IR argument uses with loads from a constant offset from an > intrinsic call. This still needs to leave the IR arguments in place because > we do need to know the original argument sizes and offsets, but they would > never have a use (or I would need to invent some other method of tracking > this information) > 2. Keep clang IR generation unchanged, but move the pass that lowers > arguments to loads earlier and hack out aggregate IR loads before SROA makes > things worse. This is really just a kludgier version of option 2. We do > ultimately do this late in the backend to enable vectorization, but it does > seem to make the middle end optimizer unhappy CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79744/new/ https://reviews.llvm.org/D79744 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits