[clang] [OpenMP][Clang] Force use of `num_teams` and `thread_limit` for bare kernel (PR #68373)

Shilei Tian via cfe-commits Fri, 06 Oct 2023 12:07:40 -0700

shiltian wrote:

> I think the follow up, to force the user bound for bare kernels, make sense. 
> I am not sold on this patch though. Why would we disallow users to do the 
> same looping we do in the deviceRTL while hoping the offload runtime will 
> pick a good grid size?


Because we don't have loop trip count in this case, so the runtime picks how 
many, 3200 thread blocks and 128 threads per thread block IIRC. I'm not sure 
that can be called a "good" grid size and we don't have any heuristic w/o loop 
trip count anyway.

Typically when writing a CUDA/HIP kernel, users calculate the grid/block size 
manually and launch the kernel using that sizes. That is the main reason for 
this patch. This can also make the runtime decision much easier: if we can't 
meet users' requirement, we crash.

https://github.com/llvm/llvm-project/pull/68373
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [OpenMP][Clang] Force use of `num_teams` and `thread_limit` for bare kernel (PR #68373)

Reply via email to