masahi edited a comment on pull request #9164:
URL: https://github.com/apache/tvm/pull/9164#issuecomment-937631623


   Hmm interesting, I never thought about doing constant folding on partitioned 
functions. My use cases have always been doing constant folding on `main`, 
before partitioning. For example, that was the case in PyTorch frontend before 
#9135 which always produced something like `qnn.quantize(const_weight_fp32)`. 
The other case is QNN produced by 
[FakeQuantizationToInteger](https://github.com/apache/tvm/blob/4ffbdcd0aaed4f382f06c6a9e2b2d048b6abdaa9/src/relay/transforms/fake_quantization_to_integer.cc)
 pass, which also generates many `qnn.quantize` with constant weights.
   
   In 2), if we run legalization on partitioned functions, wouldn't that 
decompose all QNN ops? I couldn't easily extract qparams anymore, for example. 
I needed to retain QNN ops all the way until I translated them to the external 
IR, so running legalization had never been my option. I did wish that we could 
selectively lower const-foldable QNN subgraphs only. Maybe I'm missing 
something.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to