masahi opened a new pull request #9135: URL: https://github.com/apache/tvm/pull/9135
This addresses the issue discussed in https://discuss.tvm.apache.org/t/qnn-pytorch-byoc-full-integer-qnn-support/11127 PyTorch stores quantized weights in a custom format, so we cannot directly access 8 bit weights as Numpy arrays. We use a PyTorch function to unpack quantized weights into float32 arrays and quantization parameters. By default, we use `qnn.op.quantize(...)` to recover int8 weights, return float32 weights to users, and rely on the QNN lowering and the Relay constant folding pass to quantize weights at compile time. In BYOC use cases, however, we cannot apply the constant folding pass on a QNN graph. I added a new option to quantize weights in the frontend using a function that is equivalent to qnn.op.quantize(...) operating on Numpy arrays. In hindsight, we should've chosen this way from the beginning. The old behavior is kept as the default for backward compatibility. cc @comaniac -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
