mbrookhart commented on pull request #7123:
URL: https://github.com/apache/tvm/pull/7123#issuecomment-756239725


   Looking at the code, assuming you have thrust enabled, this should be 
kernel0:
   
https://github.com/apache/tvm/blob/9815ae2d9e17eece1a1009eb6436c80f931c734e/python/tvm/topi/cuda/nms.py#L798-L811
   the thrust argsort wont get a number:
   
https://github.com/apache/tvm/blob/9815ae2d9e17eece1a1009eb6436c80f931c734e/python/tvm/topi/cuda/nms.py#L818-L820
   And this should be 1:
   
https://github.com/apache/tvm/blob/9815ae2d9e17eece1a1009eb6436c80f931c734e/python/tvm/topi/cuda/nms.py#L543-L579
   
   That could have threads `(1,1,1),(1024,1,1)` if we have batch_size=1 and 
num_anchors <= 1024. I'm not seeing anything in there that jumps out as having 
an issue though. Every use of j is gaurded by and if scope with j<num_anchors, 
j< nkeep, or j< valid_count, and nkeep is strictly less than valid_count. The 
only way it could fail is if valid_count > num_anchors...
   
   So possibly it's failing because my changes to get_valid_count are returning 
the wrong valid_count.
   
   @trevor-m any chance we can dump the inputs/attrs for get_valid_count so I 
can make a unit test to check that hypothesis? I haven't been able to get it to 
fail with random inputs, but possibly there's an edge case in my exclusive_scan 
algorithm for this input data.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to