tqchen commented on PR #16183:
URL: https://github.com/apache/tvm/pull/16183#issuecomment-2273212033

   We just find out some a perf regression introduced by this PR, specifically, 
during LLM decode function the PackedFunc calling overhead goes up to 1.4 ms. 
This can impact the ability of downstream system. Likely this is due to some of 
the automatic conversion logic introduced. 
   
   As a temp measure, I am going to first revert this.
   
   The changes of the PR(boxed types and bool) are valuable. Given the smart 
auto conversion might introduce extra overhead and the regression we see. I 
think it is helpful to isolate the PR into two pieces, with one that introduces 
the boxed types/bool, and another one that introduce a possiblity more 
conservative version of automatic conversion.
   
   Thanks @Lunderberg for great effort and sorry for the extra trouble. As 
PackedFunc call gets into the center of our runtime execution, being able to 
reduce the execution time is now become an important topic, so we need to be 
more careful balancing the runtime logics
   
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to