Hello.

In rk3399, i found a performance decrease during inference using the vgg-16 
model.

Performance was measured using the test code below.

    import tvm
    import tvm.relay as relay
    from tvm.contrib import graph_runtime
    import numpy as np
    import topi
    from tvm.relay.testing.temp_op_attr import TempOpAttr

    target_arm_cpu = tvm.target.create('llvm -device=arm_cpu 
-target=aarch64-linux-gnu')
    ctx_arm_cpu =  tvm.cpu()
    dtype='float32'
    batch_size = 1
    num_class = 1000
    image_shape = (3, 224, 224)
    data_shape = (batch_size,) + image_shape
    out_shape = (batch_size, num_class)
    mod, paramsO = relay.testing.vgg.get_workload(
        num_layers=16, batch_size=batch_size, image_shape=image_shape)
    opt_level = 3

    #arm_cpu 
    with relay.build_config(opt_level = opt_level):
        graph, lib, params = relay.build_module.build( mod, target_arm_cpu , 
params = paramsO )

    data = tvm.nd.array( np.random.uniform(-1, 1, size=data_shape 
).astype("float32") , ctx_arm_cpu )
    module = graph_runtime.create(graph, lib, ctx_arm_cpu)
    module.set_input("data", data)
    module.set_input(**params)
    module.run()

When running vgg-16 using arm cpu in current tvm version, the performance is as 
follows.

`Mean inference time (std dev): 1892.25 ms (2.20 ms)`

and old tvm version is 

`Mean inference time (std dev): 989.96 ms (0.80 ms)`

The performance difference between the new version and the old version is too 
big.

i think the new version of tvm doesn't seem to find the config for vgg-16.
Below is the log when compiling vgg-16 model using Relay.

    Cannot find config for target=llvm -device=arm_cpu 
-target=aarch64-linux-gnu, workload=('conv2d_nchw_winograd.arm_cpu', ('TENSOR', 
(1, 3, 224, 224), 'float32'), ('TENSOR', (64, 3, 3, 3), 'float32'), (1, 1), (1, 
1, 1, 1), (1, 1), 'float32'). A fallback configuration is used, which may bring 
great performance regression.
    Cannot find config for target=llvm -device=arm_cpu 
-target=aarch64-linux-gnu, workload=('conv2d_nchw_spatial_pack.arm_cpu', 
('TENSOR', (1, 64, 224, 224), 'float32'), ('TENSOR', (64, 64, 3, 3), 
'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32'). A fallback configuration 
is used, which may bring great performance regression.
    Cannot find config for target=llvm -device=arm_cpu 
-target=aarch64-linux-gnu, workload=('conv2d_nchw_spatial_pack.arm_cpu', 
('TENSOR', (1, 64, 112, 112), 'float32'), ('TENSOR', (128, 64, 3, 3), 
'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32'). A fallback configuration 
is used, which may bring great performance regression.
    Cannot find config for target=llvm -device=arm_cpu 
-target=aarch64-linux-gnu, workload=('conv2d_nchw_spatial_pack.arm_cpu', 
('TENSOR', (1, 128, 112, 112), 'float32'), ('TENSOR', (128, 128, 3, 3), 
'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32'). A fallback configuration 
is used, which may bring great performance regression.
    Cannot find config for target=llvm -device=arm_cpu 
-target=aarch64-linux-gnu, workload=('conv2d_nchw_spatial_pack.arm_cpu', 
('TENSOR', (1, 128, 56, 56), 'float32'), ('TENSOR', (256, 128, 3, 3), 
'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32'). A fallback configuration 
is used, which may bring great performance regression.
    Cannot find config for target=llvm -device=arm_cpu 
-target=aarch64-linux-gnu, workload=('conv2d_nchw_spatial_pack.arm_cpu', 
('TENSOR', (1, 256, 56, 56), 'float32'), ('TENSOR', (256, 256, 3, 3), 
'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32'). A fallback configuration 
is used, which may bring great performance regression.
    Cannot find config for target=llvm -device=arm_cpu 
-target=aarch64-linux-gnu, workload=('conv2d_nchw_spatial_pack.arm_cpu', 
('TENSOR', (1, 256, 28, 28), 'float32'), ('TENSOR', (512, 256, 3, 3), 
'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32'). A fallback configuration 
is used, which may bring great performance regression.
    Cannot find config for target=llvm -device=arm_cpu 
-target=aarch64-linux-gnu, workload=('conv2d_nchw_spatial_pack.arm_cpu', 
('TENSOR', (1, 512, 28, 28), 'float32'), ('TENSOR', (512, 512, 3, 3), 
'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32'). A fallback configuration 
is used, which may bring great performance regression.
    Cannot find config for target=llvm -device=arm_cpu 
-target=aarch64-linux-gnu, workload=('conv2d_nchw_spatial_pack.arm_cpu', 
('TENSOR', (1, 512, 14, 14), 'float32'), ('TENSOR', (512, 512, 3, 3), 
'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32'). A fallback configuration 
is used, which may bring great performance regression.
    Cannot find config for target=llvm -device=arm_cpu 
-target=aarch64-linux-gnu, workload=('dense_nopack.x86', ('TENSOR', (1, 4096), 
'float32'), ('TENSOR', (1000, 4096), 'float32'), None, 'float32'). A fallback 
configuration is used, which may bring great performance regression.
    Cannot find config for target=llvm -device=arm_cpu 
-target=aarch64-linux-gnu, workload=('dense_nopack.x86', ('TENSOR', (1, 4096), 
'float32'), ('TENSOR', (4096, 4096), 'float32'), None, 'float32'). A fallback 
configuration is used, which may bring great performance regression.
    Cannot find config for target=llvm -device=arm_cpu 
-target=aarch64-linux-gnu, workload=('dense_nopack.x86', ('TENSOR', (1, 25088), 
'float32'), ('TENSOR', (4096, 25088), 'float32'), None, 'float32'). A fallback 
configuration is used, which may bring great performance regression.

And below is the log when I compiled vgg-16 with old tvm.

    Cannot find config for target=llvm -device=arm_cpu 
-target=aarch64-linux-gnu, workload=('dense', (1, 4096, 'float32'), (1000, 
4096, 'float32'), 0, 'float32'). A fallback configuration is used, which may 
bring great performance regression.
    Cannot find config for target=llvm -device=arm_cpu 
-target=aarch64-linux-gnu, workload=('dense', (1, 4096, 'float32'), (4096, 
4096, 'float32'), 0, 'float32'). A fallback configuration is used, which may 
bring great performance regression.
    Cannot find config for target=llvm -device=arm_cpu 
-target=aarch64-linux-gnu, workload=('dense', (1, 25088, 'float32'), (4096, 
25088, 'float32'), 0, 'float32'). A fallback configuration is used, which may 
bring great performance regression.

As you can see from the log, the fallback config for conv2d does not appear in 
the old version of tvm, but the fallback config for con2d occurs in the new 
version.

I think the current version of tvm can't catch the conv2d config, so it seems 
to cause performance degradation. is it intended or internal tvm problem?





---
[Visit 
Topic](https://discuss.tvm.ai/t/the-current-version-of-tvm-cannot-find-the-configuration-of-conv2d/6277/1)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.ai/email/unsubscribe/a397378c22234c64f0a55442fadd8223f7766e865c7e26a96523d5158dcc6706).

Reply via email to