masahi opened a new pull request #9135:
URL: https://github.com/apache/tvm/pull/9135


   This addresses the issue discussed in 
https://discuss.tvm.apache.org/t/qnn-pytorch-byoc-full-integer-qnn-support/11127
   
   PyTorch stores quantized weights in a custom format, so we cannot directly 
access 8 bit weights as Numpy arrays. We use  a PyTorch function to unpack 
quantized weights into float32 arrays and quantization parameters. 
   
   By default, we use `qnn.op.quantize(...)` to recover int8 weights, return 
float32 weights to users, and rely on the QNN lowering and the Relay constant 
folding pass to quantize weights at compile time. In BYOC use cases, however,  
we cannot apply the constant folding pass on a QNN graph. 
   
   I added a new option to quantize weights in the frontend using a function 
that is equivalent to qnn.op.quantize(...) operating on Numpy arrays. In 
hindsight, we should've chosen this way from the beginning. The old behavior is 
kept as the default for backward compatibility. 
   
   cc @comaniac 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to