mbrookhart commented on pull request #6839: URL: https://github.com/apache/tvm/pull/6839#issuecomment-742807888
@Laurawly I just rewrote get_valid_counts in the way you suggested. I still need to better parallelize the sum/conditional scan operation, but this takes it to: ``` Ops Time(us) Time(%) Shape Inputs Outputs --- -------- ------- ----- ------ ------- fused_vision_non_max_suppression 12105.9 25.723 (1, 122640, 6) fused_vision_get_valid_counts 3517.22 7.474 (1, 122640, 6) ``` I'm going to take another look at NMS before I try to parallelize the sum ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org