tqchen commented on PR #16183: URL: https://github.com/apache/tvm/pull/16183#issuecomment-2273212033
We just find out some a perf regression introduced by this PR, specifically, during LLM decode function the PackedFunc calling overhead goes up to 1.4 ms. This can impact the ability of downstream system. Likely this is due to some of the automatic conversion logic introduced. As a temp measure, I am going to first revert this. The changes of the PR(boxed types and bool) are valuable. Given the smart auto conversion might introduce extra overhead and the regression we see. I think it is helpful to isolate the PR into two pieces, with one that introduces the boxed types/bool, and another one that introduce a possiblity more conservative version of automatic conversion. Thanks @Lunderberg for great effort and sorry for the extra trouble. As PackedFunc call gets into the center of our runtime execution, being able to reduce the execution time is now become an important topic, so we need to be more careful balancing the runtime logics -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org