t-vi edited a comment on pull request #5727: URL: https://github.com/apache/incubator-tvm/pull/5727#issuecomment-639109441
That's the idea, yes. In my microbenchmark of the imagenet softmax on the Radeon VII, I'm going from ~140µs to ~14µs. The baseline from PyTorch (handcrafted but somewhat generic kernel) is ~18µs, so this is going well. :slightly_smiling_face: Of course, the topi work is entirely @wpan11nv 's. I'm quite happy I managed to enable warp reductions on ROCm, though. And the speedup is not just the warp reductions, the previous softmax in topi was quite unoptimized. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org