masahi commented on issue #10223:
URL: https://github.com/apache/tvm/issues/10223#issuecomment-1051988687


   oops good find! Can you send a PR? I can quickly merge it (if I do it you 
need to wait until next week).
   
   > Sadly I'm still just a few FPS shy of my performance target so I'll have 
to keep on digging for speedups.
   
   How TVM + cuDNN compares to PT? Since you are running on fp16, I'd hope that 
we can use tensorcore. But I've never seen grouped convolution running on 
tensorcore. Also cutlass is generally faster than cuDNN but it doesn't support 
grouped or depth wise afaik.
   
   > RE: support for groups in the regular cuda backend. Do you have a general 
idea of what kind of changes are necessary for that
   
   Yes, you can try adding `group` argument to 
   
https://github.com/apache/tvm/blob/a1f51aa230d33bed831b65eec6e209733dbfec57/python/tvm/topi/cuda/conv2d_transpose.py#L30.
 I think it shouldn't be too hard (also see 
https://github.com/apache/tvm/pull/9465) .
   
   and update `python/relay/op/strategy/x86.py` similarly to how 
https://github.com/apache/tvm/pull/9465 did it for x86. 
   
   You may try our auto-scheduler to see if it can beat cuDNN. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to