Laurawly edited a comment on pull request #6839: URL: https://github.com/apache/tvm/pull/6839#issuecomment-742808115
> So, we seem to be at a bit of an impass. The implementation that is in master is not 100% correct, and it was even less correct before #7005. This PR is the best combination of correctness and performance we have a the moment. I would like to get it merged to have a correct implementation in place, but I would also like to find a way to speed up get valid counts, what I have here is not great. > > I'm happy to spend the next few days trying to improve parallelism in get_valid_counts, but given that this is a performance improvement over main and it's more correct, I think we shouldn't block merging over the current performance, we could always do a second PR to improve performance once we have the unit tests in place We have tested the correctness of mxnet ssd related detection models before PR #7005 and they are in the correctness margin for customers to ship. I do suggest a perf comparison of your implementation and the one in main block by block. And an overall performance comparison with the implementation before PR #7005's change to `nms.py`. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org