t-vi edited a comment on pull request #5727:
URL: https://github.com/apache/incubator-tvm/pull/5727#issuecomment-639109441


   That's the idea, yes. In my microbenchmark of the imagenet softmax on the 
Radeon VII, I'm going from ~140µs to ~14µs. The baseline from PyTorch 
(handcrafted but somewhat generic kernel) is ~18µs, so this is going well. 
:slightly_smiling_face: 
   Of course, the topi work is entirely @wpan11nv 's. I'm quite happy I managed 
to enable warp reductions on ROCm, though.
   And the speedup is not just the warp reductions, the previous softmax in 
topi was quite unoptimized.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to