guberti opened a new pull request, #12671:
URL: https://github.com/apache/tvm/pull/12671

   Currently, Relay QNN uses its `helper_no_fast_int8_hw_legalization` change 
all `int8` operations in Cortex-M-compiled models into `int16` operations. I 
believe we do this because Cortex-M chips do not have a `4xint8` 
multiply-accumulate function, and so in some sense we don't have fast `int8` 
operations.
   
   However, changing our operators to `int16` is substantially slower, as while 
it saves a few sign extension operations, it doubles the amount of memory loads 
we need to perform. This PR changes this behavior, and ensures that Cortex-M 
microcontrollers do not have `int8` operations turned into `int16` ones.
   
   I have also verified that this does, in fact, improve performance for some 
common models. For example, MobileNet_v1_0.25 on the Cortex-M4 saw a 10% 
performance improvement, compared to before this change. Accuracy does not seem 
to be affected.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to