masahi edited a comment on pull request #9164: URL: https://github.com/apache/tvm/pull/9164#issuecomment-937631623
Hmm interesting, I never thought about doing constant folding on partitioned functions. My use cases have always been doing constant folding on `main`, before partitioning. For example, that was the case in PyTorch frontend before #9135 which always produced something like `qnn.quantize(const_weight_fp32)`. The other case is QNN produced by [FakeQuantizationToInteger](https://github.com/apache/tvm/blob/4ffbdcd0aaed4f382f06c6a9e2b2d048b6abdaa9/src/relay/transforms/fake_quantization_to_integer.cc) pass, which also generates many `qnn.quantize` with constant weights. In 2), if we run legalization on partitioned functions, wouldn't that decompose all QNN ops? I couldn't easily extract qparams anymore, for example. I needed to retain QNN ops all the way until I translated them to the external IR, so running legalization had never been my option. I did wish that we could selectively lower const-foldable QNN subgraphs only. Maybe I'm missing something. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org