masahi edited a comment on pull request #9261:
URL: https://github.com/apache/tvm/pull/9261#issuecomment-948543423


   This is ready for review. The bulk of PR is a kernel generator taken from 
cutlass, simplified for our purpose + more epilogue support.
   
   The followings are supported:
   * TensorCore kernels (no WMMA or SIMT)
   * `dense` with bias, bias + relu, and bias + gelu epilogue
   * fp16 and fp32 accumulation
   * Kernels for Turing and Ampere
   
   The follow-up work will add more features:
   * dynamic input (only basic one, the heuristic stuff is complicated so I 
defer it for future)
   * conv2d and batched gemm
   * More epilogues to support e2d models. I'm planning to test MaskRCNN, 
DeeplabV3, EfficientNet v2, and BERT-large. This require some work to extend 
cutlass (see https://github.com/NVIDIA/cutlass/discussions/347)
   
   Low priority (for me) but things we should support eventually:
   * Heuristic for dynamic inputs
   * int8 and int4 kernels


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to