tqchen commented on issue #5307: [TIR] Make lower_warp_memory support extent(threadIdx.x) < warp_size URL: https://github.com/apache/incubator-tvm/pull/5307#issuecomment-612638955 Thanks for the discussions so far, it would be great to have a discussion(perhaps in the forum) about the possible conventions to define th warp, just to clarify further. - A0: Enforce threadIdx.x == warp_size, and make it the warp index - Add a subwarp shuffle detection directly on to the warp shuffle to detect sub-warp shuffle pattern like ```B[(wi/4)*4+ ((wi % 4) +1) %4]``` whcih corresponds to a sub-warp shuffle. - A1: Warp virualization -- Allow threadIdx.x to be at subwarp level. (this PR) From what I see, A0 might introduce a bit more complexity, but allows a mixture of subwarp and full warp shuffle patterns. A1 is can be viewed as "virtualizing the warps" by having more virtual warps that acted at the sub-warp level, and use a single warp to simulate them, we cannot mix that with full warp shuffle. It would be great if we can think a bit about how to describe these different clearly and document them in the comment of the code, and possibly in the future developer docs. If my understanding is correct, and we can document A1's concept clearly, I can go ahead and merge this PR first, then we do followup discussons.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
