roastduck commented on issue #5307: [TIR] Make lower_warp_memory support extent(threadIdx.x) < warp_size URL: https://github.com/apache/incubator-tvm/pull/5307#issuecomment-612561375 I see, so you are talking about the programming model instead of the code generation. In the alternative model, a user doesn't cache a buffer to "warp" scope. Instead, a user accesses variables from other threads directly through indices. Have I correctly understood your idea? What are the advantages of the new model over the current one? You said the current model conceptually reduces a warp to half. Do you mean that it is hard to mix full-warp shuffling and half-warp shuffling for a single buffer? I think this problem can be solved by improving the detection algorithms used in the `lower_warp_memory` pass, without switching to a new programming model. If the expressing ability of the two models are equivalent, I prefer the current "bind to warp scope" model, because it is more general and there is no need to introducing a new language feature.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
