[TVM Discuss] [Questions] [ Bug ] The arm cpu performance of the new version of tvm is too low than the old version

ckh via TVM Discuss Mon, 06 Apr 2020 19:12:19 -0700


hello!


I am currently using rk3399 board to measure performance by running vgg-16 with 
old tvm and current tvm. Below is the specification.

    rk3399 device 1 -> old version of tvm and ubuntu16.04 + LLVM 8.0.0
    rk3399 devcie 2 -> new version of tvm and ubuntu18.04 + LLVM 8.0.0


and I tested it with the same code below.

    import tvm
    import tvm.relay as relay
    #from tvm.contrib import graph_runtime
    from tvm.contrib.debugger import debug_runtime as graph_runtime
    import numpy as np
    import topi
    from tvm.relay.testing.temp_op_attr import TempOpAttr

    target_arm_cpu = tvm.target.create('llvm -device=arm_cpu 
-target=aarch64-linux-gnu')
    ctx_arm_cpu =  tvm.cpu()
    dtype='float32'
    batch_size = 1
    num_class = 1000
    image_shape = (3, 224, 224)
    data_shape = (batch_size,) + image_shape
    out_shape = (batch_size, num_class)
    mod, paramsO = relay.testing.vgg.get_workload(
        num_layers=16, batch_size=batch_size, image_shape=image_shape)
    opt_level = 3

    #arm_cpu 
    with relay.build_config(opt_level = opt_level):
        graph, lib, params = relay.build_module.build( mod, target_arm_cpu , 
params = paramsO )

    data = tvm.nd.array( np.random.uniform(-1, 1, size=data_shape 
).astype("float32") , ctx_arm_cpu )
    module = graph_runtime.create(graph, lib, ctx_arm_cpu)
    module.set_input("data", data)
    module.set_input(**params)
    module.run()

And the result is below.

    rk3399 device 1 performance is Mean inference time (std dev): 989.96 ms 
(0.80 ms)
    rk3399 device 2 performacne is Mean inference time (std dev): 1961.32 ms 
(2.55 ms)

I think the new version of tvm can't catch the tunning configuration.
Looking at the log below, the new tvm and old tvm configurations are different.

    [New Version TVM when compile vgg-16]
    Cannot find config for target=llvm -device=arm_cpu 
-target=aarch64-linux-gnu, workload=('conv2d_nchw_winograd.arm_cpu', ('TENSOR', 
(1, 3, 224, 224), 'float32'), ('TENSOR', (64, 3, 3, 3), 'float32'), (1, 1), (1, 
1, 1, 1), (1, 1), 'float32'). A fallback configuration is used, which may bring 
great performance regression.

    Cannot find config for target=llvm -device=arm_cpu 
-target=aarch64-linux-gnu, workload=('conv2d_nchw_spatial_pack.arm_cpu', 
('TENSOR', (1, 64, 224, 224), 'float32'), ('TENSOR', (64, 64, 3, 3), 
'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32'). A fallback configuration 
is used, which may bring great performance regression.

    Cannot find config for target=llvm -device=arm_cpu 
-target=aarch64-linux-gnu, workload=('conv2d_nchw_spatial_pack.arm_cpu', 
('TENSOR', (1, 64, 112, 112), 'float32'), ('TENSOR', (128, 64, 3, 3), 
'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32'). A fallback configuration 
is used, which may bring great performance regression.

    Cannot find config for target=llvm -device=arm_cpu 
-target=aarch64-linux-gnu, workload=('conv2d_nchw_spatial_pack.arm_cpu', 
('TENSOR', (1, 128, 112, 112), 'float32'), ('TENSOR', (128, 128, 3, 3), 
'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32'). A fallback configuration 
is used, which may bring great performance regression.

    Cannot find config for target=llvm -device=arm_cpu 
-target=aarch64-linux-gnu, workload=('conv2d_nchw_spatial_pack.arm_cpu', 
('TENSOR', (1, 128, 56, 56), 'float32'), ('TENSOR', (256, 128, 3, 3), 
'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32'). A fallback configuration 
is used, which may bring great performance regression.

    Cannot find config for target=llvm -device=arm_cpu 
-target=aarch64-linux-gnu, workload=('conv2d_nchw_spatial_pack.arm_cpu', 
('TENSOR', (1, 256, 56, 56), 'float32'), ('TENSOR', (256, 256, 3, 3), 
'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32'). A fallback configuration 
is used, which may bring great performance regression.

    Cannot find config for target=llvm -device=arm_cpu 
-target=aarch64-linux-gnu, workload=('conv2d_nchw_spatial_pack.arm_cpu', 
('TENSOR', (1, 256, 28, 28), 'float32'), ('TENSOR', (512, 256, 3, 3), 
'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32'). A fallback configuration 
is used, which may bring great performance regression.

    Cannot find config for target=llvm -device=arm_cpu 
-target=aarch64-linux-gnu, workload=('conv2d_nchw_spatial_pack.arm_cpu', 
('TENSOR', (1, 512, 28, 28), 'float32'), ('TENSOR', (512, 512, 3, 3), 
'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32'). A fallback configuration 
is used, which may bring great performance regression.

    Cannot find config for target=llvm -device=arm_cpu 
-target=aarch64-linux-gnu, workload=('conv2d_nchw_spatial_pack.arm_cpu', 
('TENSOR', (1, 512, 14, 14), 'float32'), ('TENSOR', (512, 512, 3, 3), 
'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32'). A fallback configuration 
is used, which may bring great performance regression.

    Cannot find config for target=llvm -device=arm_cpu 
-target=aarch64-linux-gnu, workload=('dense_nopack.x86', ('TENSOR', (1, 4096), 
'float32'), ('TENSOR', (1000, 4096), 'float32'), None, 'float32'). A fallback 
configuration is used, which may bring great performance regression.

    Cannot find config for target=llvm -device=arm_cpu 
-target=aarch64-linux-gnu, workload=('dense_nopack.x86', ('TENSOR', (1, 4096), 
'float32'), ('TENSOR', (4096, 4096), 'float32'), None, 'float32'). A fallback 
configuration is used, which may bring great performance regression.

    Cannot find config for target=llvm -device=arm_cpu 
-target=aarch64-linux-gnu, workload=('dense_nopack.x86', ('TENSOR', (1, 25088), 
'float32'), ('TENSOR', (4096, 25088), 'float32'), None, 'float32'). A fallback 
configuration is used, which may bring great performance regression.

and old one is ...

    [ old version of tvm when compile vgg-16 ]
    Cannot find config for target=llvm -device=arm_cpu 
-target=aarch64-linux-gnu, workload=('dense', (1, 4096, 'float32'), (1000, 
4096, 'float32'), 0, 'float32'). A fallback configuration is used, which may 
bring great performance regression.

    Cannot find config for target=llvm -device=arm_cpu 
-target=aarch64-linux-gnu, workload=('dense', (1, 4096, 'float32'), (4096, 
4096, 'float32'), 0, 'float32'). A fallback configuration is used, which may 
bring great performance regression.

    Cannot find config for target=llvm -device=arm_cpu 
-target=aarch64-linux-gnu, workload=('dense', (1, 25088, 'float32'), (4096, 
25088, 'float32'), 0, 'float32'). A fallback configuration is used, which may 
bring great performance regression.

How to solve this ?? Or do I have to do the tuning myself?





---
[Visit 
Topic](https://discuss.tvm.ai/t/bug-the-arm-cpu-performance-of-the-new-version-of-tvm-is-too-low-than-the-old-version/6245/1)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.ai/email/unsubscribe/feccbf1fa9f159a5d3bc88bce78a3ad52f96a526b6669d54db417076da67e7d4).

[TVM Discuss] [Questions] [ Bug ] The arm cpu performance of the new version of tvm is too low than the old version

Reply via email to