[quote="anijain2305, post:27, topic:6256, full:true"]
For rasp3 and rasp4, we saw 1.3x - 1.5x performance speedup going from FP32 to
Int8.
The link comparing QNNPACK and TVM is not upstream'd yet. If I understand
correctly, it will be sometime before the authors of that work will be able to
make it to upstream. There are some differences in underlying design as well,
which might cause some delays in getting to that performance.
Regarding int16, we observed that LLVM can generate good enough good with int16
instead of int8 for rasp3/4. So we uplift the datatype to int16 (exception is
Intel Cascadelake and Nvidia devices). When we write a better schedule with
int8 datatypes, we can remove the upcasting.
[/quote]
@anijain2305
Hi, I test speed of fp32 and int8 for squeezenet on android device(arm64-v8a).
Here is my config,
```
target = 'llvm -device=arm_cpu -target=aarch64-linux-android
-mattr=+v8.2a,+dotprod'
```
First, I load fp32 model by `mod, params = relay.frontend.from_onnx(onnx_model,
input_shapes) Then, convert fp32 model to int8 by `relay.quantize` of tvm's own
`
```
with relay.quantize.qconfig(calibrate_mode='global_scale',
global_scale=8.0):
mod = relay.quantize.quantize(mod, params)
```
Here we only focus on int8 speed-up compared int32, instead of accuracy.
```
# Some Log For INT32:
WARNING:autotvm:Cannot find config for target=llvm -device=arm_cpu
-target=arm64-linux-android -mattr=+v8.2a,+dotprod,
workload=('conv2d_nchw_spatial_pack.arm_cpu', ('TENSOR', (1, 512, 9, 9),
'int8'), ('TENSOR', (1000, 512, 1, 1), 'int8'), (1, 1), (0, 0, 0, 0), (1, 1),
'int32'). A fallback configuration is used, which may bring great performance
regression.
Mean inference time (std dev): 24.38 ms (3.62 ms)
# Some Log For INT8
Cannot find config for target=llvm -device=arm_cpu -target=arm64-linux-android
-mattr=+v8.2a,+dotprod, workload=('conv2d_nchw_spatial_pack.arm_cpu',
('TENSOR', (1, 512, 9, 9), 'float32'), ('TENSOR', (1000, 512, 1, 1),
'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'float32'). A fallback configuration
is used, which may bring great performance regression.
Mean inference time (std dev): 17.50 ms (2.44 ms)
```
You said 1.3x - 1.5x performance speedup going from FP32 to Int8 for rasp3 and
rasp4,
Could you please give some advice on:
1) how to eliminate "Cannot find config for " Warning.
2) how to achieve the 1.3x - 1.5x performance speedup, could you share your
scripts for easy test?
Thanks very much!
---
[Visit
Topic](https://discuss.tvm.ai/t/is-there-any-speed-comparison-of-quantization-on-cpu/6256/42)
to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click
here](https://discuss.tvm.ai/email/unsubscribe/a33d786b419c42c8b61f7aa4299bf537dbeaa3c6929054b9db457408011ccf06).