[GitHub] [incubator-tvm] t-vi edited a comment on pull request #5600: [TOPI] Improve CUDA softmax scheduling

GitBox Thu, 04 Jun 2020 00:48:36 -0700


t-vi edited a comment on pull request #5600:
URL: https://github.com/apache/incubator-tvm/pull/5600#issuecomment-638622419



   I'm adding shfl intrinsics to the rocm bits (using 
`tvm.intrin.rule.rocm.tvm_warp_shuffle /-up/-down` definitions).
   I'll probably run into the nvptx bits in the llvm codegen. Is there a reason 
not to use the intrin.rule mechanism for nvptx?
   I'm not sure running `gpu_imagenet_bench.py` (which I'm using as the first 
stop of seeing if anything works) with the nvptx target works for me (though I 
get to the codegen for that), but I would not know if it worked before...
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [incubator-tvm] t-vi edited a comment on pull request #5600: [TOPI] Improve CUDA softmax scheduling

Reply via email to