roastduck opened a new pull request #5307: [TIR] Make lower_warp_memory support 
extent(threadIdx.x) < warp_size
URL: https://github.com/apache/incubator-tvm/pull/5307
 
 
   
   
   Pass `lower_warp_memory` lowers memory bound to "warp" scope into the warp 
shuffle intrinsic. Currently, this pass only supports the situation where the 
extent of `threadIdx.x` equals to the warp size. However, CUDA's `__shfl` has a 
3rd parameter `width` to shuffle variables in half (or 1/4, 1/8, 1/16) of a 
warp. This PR uses this extra parameter to enable Pass `lower_warp_memory` when 
the extent of `threadIdx.x` is less than the warp size.
   
   
   
   Changes:
   
   1. Add a 3rd parameter `width` and a 4th parameter `warp_size` to TVM 
intrinsic `tvm_warp_shuffle`. The 4th parameter `warp_size` is used to help a 
Code Generator to decide whether a `width` is legal. For example, the OpenCL 
backend dose not support the `width` parameter, so it has to check whether 
`width == warp_size`. Since currently `lower_warp_memory` is the only pass that 
utilize `tvm_warp_shuffle`, this change will not break any dependencies.
   2. Code Generators that lowers `tvm_warp_shuffle` are modified. Currently, 
the only two affected Code Generators are CUDA and OpenCL.
   3. In `lower_warp_memory`, find the value of `width` first, and then alter 
the IR base on `width`, instead of based on `warp_size`. Then, it generate the 
modified `tvm_warp_shuffle` intrinsic.
   4. A test which runs `lower_warp_memory` with 1/2 warp size is added.
   
   Can @tqchen, @ZihengJiang or @ajtulloch make a review or suggest any other 
reviewers?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to