hello!
I am currently using rk3399 board to measure performance by running vgg-16 with
old tvm and current tvm. Below is the specification.
rk3399 device 1 -> old version of tvm and ubuntu16.04 + LLVM 8.0.0
rk3399 devcie 2 -> new version of tvm and ubuntu18.04 + LLVM 8.0.0
and I tested it with the same code below.
import tvm
import tvm.relay as relay
#from tvm.contrib import graph_runtime
from tvm.contrib.debugger import debug_runtime as graph_runtime
import numpy as np
import topi
from tvm.relay.testing.temp_op_attr import TempOpAttr
target_arm_cpu = tvm.target.create('llvm -device=arm_cpu
-target=aarch64-linux-gnu')
ctx_arm_cpu = tvm.cpu()
dtype='float32'
batch_size = 1
num_class = 1000
image_shape = (3, 224, 224)
data_shape = (batch_size,) + image_shape
out_shape = (batch_size, num_class)
mod, paramsO = relay.testing.vgg.get_workload(
num_layers=16, batch_size=batch_size, image_shape=image_shape)
opt_level = 3
#arm_cpu
with relay.build_config(opt_level = opt_level):
graph, lib, params = relay.build_module.build( mod, target_arm_cpu ,
params = paramsO )
data = tvm.nd.array( np.random.uniform(-1, 1, size=data_shape
).astype("float32") , ctx_arm_cpu )
module = graph_runtime.create(graph, lib, ctx_arm_cpu)
module.set_input("data", data)
module.set_input(**params)
module.run()
And the result is below.
rk3399 device 1 performance is Mean inference time (std dev): 989.96 ms
(0.80 ms)
rk3399 device 2 performacne is Mean inference time (std dev): 1961.32 ms
(2.55 ms)
I think the new version of tvm can't catch the tunning configuration.
Looking at the log below, the new tvm and old tvm configurations are different.
[New Version TVM when compile vgg-16]
Cannot find config for target=llvm -device=arm_cpu
-target=aarch64-linux-gnu, workload=('conv2d_nchw_winograd.arm_cpu', ('TENSOR',
(1, 3, 224, 224), 'float32'), ('TENSOR', (64, 3, 3, 3), 'float32'), (1, 1), (1,
1, 1, 1), (1, 1), 'float32'). A fallback configuration is used, which may bring
great performance regression.
Cannot find config for target=llvm -device=arm_cpu
-target=aarch64-linux-gnu, workload=('conv2d_nchw_spatial_pack.arm_cpu',
('TENSOR', (1, 64, 224, 224), 'float32'), ('TENSOR', (64, 64, 3, 3),
'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32'). A fallback configuration
is used, which may bring great performance regression.
Cannot find config for target=llvm -device=arm_cpu
-target=aarch64-linux-gnu, workload=('conv2d_nchw_spatial_pack.arm_cpu',
('TENSOR', (1, 64, 112, 112), 'float32'), ('TENSOR', (128, 64, 3, 3),
'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32'). A fallback configuration
is used, which may bring great performance regression.
Cannot find config for target=llvm -device=arm_cpu
-target=aarch64-linux-gnu, workload=('conv2d_nchw_spatial_pack.arm_cpu',
('TENSOR', (1, 128, 112, 112), 'float32'), ('TENSOR', (128, 128, 3, 3),
'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32'). A fallback configuration
is used, which may bring great performance regression.
Cannot find config for target=llvm -device=arm_cpu
-target=aarch64-linux-gnu, workload=('conv2d_nchw_spatial_pack.arm_cpu',
('TENSOR', (1, 128, 56, 56), 'float32'), ('TENSOR', (256, 128, 3, 3),
'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32'). A fallback configuration
is used, which may bring great performance regression.
Cannot find config for target=llvm -device=arm_cpu
-target=aarch64-linux-gnu, workload=('conv2d_nchw_spatial_pack.arm_cpu',
('TENSOR', (1, 256, 56, 56), 'float32'), ('TENSOR', (256, 256, 3, 3),
'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32'). A fallback configuration
is used, which may bring great performance regression.
Cannot find config for target=llvm -device=arm_cpu
-target=aarch64-linux-gnu, workload=('conv2d_nchw_spatial_pack.arm_cpu',
('TENSOR', (1, 256, 28, 28), 'float32'), ('TENSOR', (512, 256, 3, 3),
'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32'). A fallback configuration
is used, which may bring great performance regression.
Cannot find config for target=llvm -device=arm_cpu
-target=aarch64-linux-gnu, workload=('conv2d_nchw_spatial_pack.arm_cpu',
('TENSOR', (1, 512, 28, 28), 'float32'), ('TENSOR', (512, 512, 3, 3),
'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32'). A fallback configuration
is used, which may bring great performance regression.
Cannot find config for target=llvm -device=arm_cpu
-target=aarch64-linux-gnu, workload=('conv2d_nchw_spatial_pack.arm_cpu',
('TENSOR', (1, 512, 14, 14), 'float32'), ('TENSOR', (512, 512, 3, 3),
'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32'). A fallback configuration
is used, which may bring great performance regression.
Cannot find config for target=llvm -device=arm_cpu
-target=aarch64-linux-gnu, workload=('dense_nopack.x86', ('TENSOR', (1, 4096),
'float32'), ('TENSOR', (1000, 4096), 'float32'), None, 'float32'). A fallback
configuration is used, which may bring great performance regression.
Cannot find config for target=llvm -device=arm_cpu
-target=aarch64-linux-gnu, workload=('dense_nopack.x86', ('TENSOR', (1, 4096),
'float32'), ('TENSOR', (4096, 4096), 'float32'), None, 'float32'). A fallback
configuration is used, which may bring great performance regression.
Cannot find config for target=llvm -device=arm_cpu
-target=aarch64-linux-gnu, workload=('dense_nopack.x86', ('TENSOR', (1, 25088),
'float32'), ('TENSOR', (4096, 25088), 'float32'), None, 'float32'). A fallback
configuration is used, which may bring great performance regression.
and old one is ...
[ old version of tvm when compile vgg-16 ]
Cannot find config for target=llvm -device=arm_cpu
-target=aarch64-linux-gnu, workload=('dense', (1, 4096, 'float32'), (1000,
4096, 'float32'), 0, 'float32'). A fallback configuration is used, which may
bring great performance regression.
Cannot find config for target=llvm -device=arm_cpu
-target=aarch64-linux-gnu, workload=('dense', (1, 4096, 'float32'), (4096,
4096, 'float32'), 0, 'float32'). A fallback configuration is used, which may
bring great performance regression.
Cannot find config for target=llvm -device=arm_cpu
-target=aarch64-linux-gnu, workload=('dense', (1, 25088, 'float32'), (4096,
25088, 'float32'), 0, 'float32'). A fallback configuration is used, which may
bring great performance regression.
How to solve this ?? Or do I have to do the tuning myself?
---
[Visit
Topic](https://discuss.tvm.ai/t/bug-the-arm-cpu-performance-of-the-new-version-of-tvm-is-too-low-than-the-old-version/6245/1)
to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click
here](https://discuss.tvm.ai/email/unsubscribe/feccbf1fa9f159a5d3bc88bce78a3ad52f96a526b6669d54db417076da67e7d4).