mbrookhart commented on pull request #6839:
URL: https://github.com/apache/tvm/pull/6839#issuecomment-742807888


   @Laurawly I just rewrote get_valid_counts in the way you suggested. I still 
need to better parallelize the sum/conditional scan operation, but this takes 
it to:
   ```
   Ops                                                                          
                           Time(us)   Time(%)  Shape               Inputs  
Outputs  
   ---                                                                          
                           --------   -------  -----               ------  
-------  
   fused_vision_non_max_suppression                                             
                           12105.9    25.723   (1, 122640, 6)      
   fused_vision_get_valid_counts                                                
                           3517.22    7.474    (1, 122640, 6)
   ```
   I'm going to take another look at NMS before I try to parallelize the sum


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to