@tkonolige Thank you for responding.
I just want to find out the amount of time spent on data layout transformations
while running inference on ResNet-50. profiler_vm seems to report a much lower
inference cost (1) than debug_executor (2). Does this not contradict your
statement that profiler_vm may be slower than graph executor?
Also I ran benchmarking via `tvm.contrib.graph_executor`:
```
with autotvm.apply_graph_best(opt_sch_file):
with tvm.transform.PassContext(opt_level=3):
lib = relay.build_module.build(mod, target=target,
params=params)
# runtime is tvm.contrib.graph_executor
module = runtime.GraphModule(lib["default"](dev))
module.set_input("data", data)
print("Evaluate inference time cost...")
print(module.benchmark(dev, func_name="main", number=100,
repeat=3, end_to_end=True))
```
The inference costs I get via this (3) is always close but lower than (1). Do
you have any idea why this is so?
The Outputs:
(1) [profiler_vm]
```
Config for target=llvm -keys=cpu -link-params=0, workload=('dense_nopack.x86',
('TENSOR', (1, 2048), 'float32'), ('TENSOR', (1000, 2048), 'float32'), None,
'float32') is missing in ApplyGraphBest context. A fallback configuration is
used, which may bring great performance regression.
Config for target=llvm -keys=cpu -link-params=0, workload=('dense_pack.x86',
('TENSOR', (1, 2048), 'float32'), ('TENSOR', (1000, 2048), 'float32'), None,
'float32') is missing in ApplyGraphBest context. A fallback configuration is
used, which may bring great performance regression.
One or more operators have not been tuned. Please tune your model for better
performance. Use DEBUG logging level to see more details.
Name Duration (us) Percent
layout Count out_layout Device data_layout kernel_layout
Hash
Argument Shapes src_layout dst_layout weight_layout
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_11 38,648.93 14.19
5 NCHW16c cpu0 NCHW64c OIHW64i16o
5c16c122a657ba21
float32[1, 4, 14, 14, 64], float32[16, 4, 3, 3, 64, 16], float32[1, 16, 1, 1,
16], float32[1, 16, 14, 14, 16]
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_6 31,069.39 11.41
4 NCHW8c cpu0 NCHW16c OIHW16i8o
f2c6de1cbe5c0ddb
float32[1, 8, 28, 28, 16], float32[16, 8, 3, 3, 16, 8], float32[1, 16, 1, 1,
8], float32[1, 16, 28, 28, 8]
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_13 23,726.42 8.71
3 NCHW8c cpu0 NCHW2c OIHW2i8o
cb108aaf00eff9e2
float32[1, 256, 7, 7, 2], float32[64, 256, 3, 3, 2, 8], float32[1, 64, 1, 1,
8], float32[1, 64, 7, 7, 8]
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_10 18,153.16 6.66
5 NCHW8c cpu0 NCHW1024c OIHW1024i8o
e4cba4831bd46d2c
float32[1, 1, 14, 14, 1024], float32[32, 1, 1, 1, 1024, 8], float32[1, 32, 1,
1, 8], float32[1, 32, 14, 14, 8]
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_4 15,697.88 5.76
2 NCHW16c cpu0 NCHW16c OIHW16i16o
b2d690588ecaac96
float32[1, 4, 56, 56, 16], float32[4, 4, 3, 3, 16, 16], float32[1, 4, 1, 1,
16], float32[1, 4, 56, 56, 16]
fused_nn_contrib_conv2d_NCHWc_add_3 14,098.72 5.18
4 NCHW16c cpu0 NCHW16c OIHW16i16o
84bec82add215ebe float32[1,
16, 14, 14, 16], float32[64, 16, 1, 1, 16, 16], float32[1, 64, 14, 14, 16],
float32[1, 64, 14, 14, 16]
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_7 10,840.88 3.98
3 NCHW16c cpu0 NCHW16c OIHW16i16o
d930aa7bf46c34e1
float32[1, 32, 28, 28, 16], float32[8, 32, 1, 1, 16, 16], float32[1, 8, 1, 1,
16], float32[1, 8, 28, 28, 16]
fused_nn_contrib_conv2d_NCHWc_add_1 10,638.57 3.91
3 NCHW16c cpu0 NCHW8c OIHW8i16o
6beba43d92784786
float32[1, 16, 28, 28, 8], float32[32, 16, 1, 1, 8, 16], float32[1, 32, 28, 28,
16], float32[1, 32, 28, 28, 16]
fused_nn_contrib_conv2d_NCHWc_add_nn_relu 8,112.57 2.98
1 NCHW8c cpu0 NCHW3c OIHW3i8o
2f8575d36cac57f0
float32[1, 1, 224, 224, 3], float32[8, 1, 7, 7, 3, 8], float32[1, 8, 1, 1, 8],
float32[1, 8, 112, 112, 8]
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_9 7,847.28 2.88
1 NCHW8c cpu0 NCHW16c OIHW16i8o
7baee5c8a4d8e4ab
float32[1, 16, 14, 14, 16], float32[32, 16, 3, 3, 16, 8], float32[1, 32, 1, 1,
8], float32[1, 32, 14, 14, 8]
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_2 7,684.11 2.82
1 NCHW16c cpu0 NCHW32c OIHW32i16o
25fd1c3d9d4e561e
float32[1, 2, 56, 56, 32], float32[4, 2, 3, 3, 32, 16], float32[1, 4, 1, 1,
16], float32[1, 4, 56, 56, 16]
fused_nn_contrib_conv2d_NCHWc_add 7,625.64 2.80
2 NCHW32c cpu0 NCHW16c OIHW16i32o
667036afd5deee1b
float32[1, 4, 56, 56, 16], float32[8, 4, 1, 1, 16, 32], float32[1, 8, 56, 56,
32], float32[1, 8, 56, 56, 32]
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_3 7,622.32 2.80
2 NCHW16c cpu0 NCHW32c OIHW32i16o
6e49d3c836077ac7
float32[1, 8, 56, 56, 32], float32[4, 8, 1, 1, 32, 16], float32[1, 4, 1, 1,
16], float32[1, 4, 56, 56, 16]
fused_nn_contrib_conv2d_NCHWc_2 7,530.83 2.76
1 NCHW16c cpu0 NCHW16c OIHW16i16o
b6e66601adaeb1e3
float32[1, 32, 28, 28, 16], float32[64, 32, 1, 1, 16, 16],
float32[1, 64, 14, 14, 16]
fused_nn_contrib_conv2d_NCHWc_add_4 7,305.51 2.68
2 NCHW16c cpu0 NCHW4c OIHW4i16o
d0d1536228842867
float32[1, 128, 7, 7, 4], float32[128, 128, 1, 1, 4, 16], float32[1, 128, 7, 7,
16], float32[1, 128, 7, 7, 16]
fused_nn_contrib_conv2d_NCHWc_3 7,303.69 2.68
1 NCHW8c cpu0 NCHW1024c OIHW1024i8o
493c374dd5e37c2b
float32[1, 1, 14, 14, 1024], float32[256, 1, 1, 1, 1024, 8],
float32[1, 256, 7, 7, 8]
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_14 7,199.44 2.64
2 NCHW8c cpu0 NCHW2048c OIHW2048i8o
af5e7bf563de2757
float32[1, 1, 7, 7, 2048], float32[64, 1, 1, 1, 2048, 8], float32[1, 64, 1, 1,
8], float32[1, 64, 7, 7, 8]
fused_nn_contrib_conv2d_NCHWc_1 7,185.16 2.64
1 NCHW16c cpu0 NCHW32c OIHW32i16o
5e7a95757d65e24e
float32[1, 8, 56, 56, 32], float32[32, 8, 1, 1, 32, 16],
float32[1, 32, 28, 28, 16]
fused_nn_contrib_conv2d_NCHWc_add_add_nn_relu 3,905.42 1.43
1 NCHW32c cpu0 NCHW16c OIHW16i32o
18ea4e7c768c292e float32[1, 4, 56, 56, 16],
float32[8, 4, 1, 1, 16, 32], float32[1, 8, 56, 56, 32], float32[1, 8, 1, 1,
32], float32[1, 8, 56, 56, 32]
fused_nn_contrib_conv2d_NCHWc 3,776.76 1.39
1 NCHW32c cpu0 NCHW8c OIHW8i32o
7ff40af88acd710e
float32[1, 8, 56, 56, 8], float32[8, 8, 1, 1, 8, 32],
float32[1, 8, 56, 56, 32]
fused_nn_contrib_conv2d_NCHWc_add_multiply_add_nn_relu 3,693.25 1.36
1 NCHW16c cpu0 NCHW4c OIHW4i16o
a3a86603f87a1daa float32[1, 128, 7, 7, 4], float32[128, 128, 1, 1, 4, 16],
float32[1, 128, 7, 7, 16], float32[1, 128, 1, 1, 16], float32[1, 128, 1, 1,
16], float32[1, 128, 7, 7, 16]
fused_nn_contrib_conv2d_NCHWc_add_add_nn_relu_1 3,616.06 1.33
1 NCHW16c cpu0 NCHW8c OIHW8i16o
faa415ce8e443d42 float32[1, 16, 28, 28, 8],
float32[32, 16, 1, 1, 8, 16], float32[1, 32, 28, 28, 16], float32[1, 32, 1, 1,
16], float32[1, 32, 28, 28, 16]
fused_nn_contrib_conv2d_NCHWc_add_2 3,601.05 1.32
1 NCHW16c cpu0 NCHW8c OIHW8i16o
c3c48546ccd1c8e4
float32[1, 32, 14, 14, 8], float32[64, 32, 1, 1, 8, 16], float32[1, 64, 14, 14,
16], float32[1, 64, 14, 14, 16]
fused_nn_contrib_conv2d_NCHWc_add_add_nn_relu_2 3,509.62 1.29
1 NCHW16c cpu0 NCHW16c OIHW16i16o
237b36f60eadc660 float32[1, 16, 14, 14, 16],
float32[64, 16, 1, 1, 16, 16], float32[1, 64, 14, 14, 16], float32[1, 64, 1, 1,
16], float32[1, 64, 14, 14, 16]
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_12 2,119.10 0.78
1 NCHW16c cpu0 NCHW16c OIHW16i16o
8d07031ff51d0737
float32[1, 64, 14, 14, 16], float32[32, 64, 1, 1, 16, 16], float32[1, 32, 1, 1,
16], float32[1, 32, 7, 7, 16]
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_8 1,969.95 0.72
1 NCHW16c cpu0 NCHW16c OIHW16i16o
8ec1781e87f7f62e
float32[1, 32, 28, 28, 16], float32[16, 32, 1, 1, 16, 16], float32[1, 16, 1, 1,
16], float32[1, 16, 14, 14, 16]
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_5 1,869.16 0.69
1 NCHW16c cpu0 NCHW32c OIHW32i16o
39975a03990f0ed6
float32[1, 8, 56, 56, 32], float32[8, 8, 1, 1, 32, 16], float32[1, 8, 1, 1,
16], float32[1, 8, 28, 28, 16]
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_1 920.43 0.34
1 NCHW32c cpu0 NCHW8c OIHW8i32o
ce29dd2da9289ac4
float32[1, 8, 56, 56, 8], float32[2, 8, 1, 1, 8, 32], float32[1, 2, 1, 1, 32],
float32[1, 2, 56, 56, 32]
fused_add_nn_relu_layout_transform 814.00 0.30
5 cpu0
7590737f314ee1d9
float32[1, 64, 14, 14, 16], float32[1, 64, 1, 1, 16],
float32[1, 1, 14, 14, 1024] NCHW16c NCHW1024c
fused_add_nn_relu 751.40 0.28
2 cpu0
f6724216088f2bf7
float32[1, 8, 56, 56, 32], float32[1, 8, 1, 1, 32],
float32[1, 8, 56, 56, 32]
fused_nn_contrib_dense_pack_add 658.90 0.24
1 cpu0
ced18cccebfa2ada
float32[1, 2048], float32[125, 2048, 8], float32[1,
1000], float32[1, 1000] NC8n
fused_add_nn_relu_1 624.30 0.23
3 cpu0
848825acfc73218b
float32[1, 32, 28, 28, 16], float32[1, 32, 1, 1, 16],
float32[1, 32, 28, 28, 16]
fused_nn_max_pool2d_add_nn_relu 378.72 0.14
NCHW8c 1 cpu0
4883943910905d24
float32[1, 8, 112, 112, 8], float32[1, 8, 1, 1, 8],
float32[1, 8, 56, 56, 8]
fused_layout_transform 173.49 0.06
5 cpu0
0693edb3d97dc77f
float32[1, 32, 14, 14, 8],
float32[1, 4, 14, 14, 64] NCHW8c NCHW64c
fused_add_nn_relu_layout_transform_1 172.54 0.06
2 cpu0
468080b095af509a
float32[1, 128, 7, 7, 16], float32[1, 128, 1, 1, 16],
float32[1, 1, 7, 7, 2048] NCHW16c NCHW2048c
fused_layout_transform_3 138.92 0.05
1 cpu0
6dda5720a553f260
float32[1, 64, 14, 14, 16],
float32[1, 1, 14, 14, 1024] NCHW16c NCHW1024c
fused_add_layout_transform 90.72 0.03
1 cpu0
69355d3cc810f874
float32[1, 3, 224, 224], float32[3, 1, 1],
float32[1, 1, 224, 224, 3] NCHW NCHW3c
fused_nn_global_avg_pool2d 83.09 0.03
NCHW16c 1 cpu0
f18307e2786f4cb3
float32[1, 128, 7, 7, 16],
float32[1, 128, 1, 1, 16]
fused_layout_transform_4 79.73 0.03
1 cpu0
aad3e266e27c5054
float32[1, 256, 7, 7, 8],
float32[1, 128, 7, 7, 16] NCHW8c NCHW16c
fused_layout_transform_2 50.75 0.02
3 cpu0
bd0b0c2ae84f7e09
float32[1, 64, 7, 7, 8],
float32[1, 128, 7, 7, 4] NCHW8c NCHW4c
fused_layout_transform_5 39.88 0.01
2 cpu0
69f132fa7e1d6749
float32[1, 64, 7, 7, 8],
float32[1, 256, 7, 7, 2] NCHW8c NCHW2c
fused_layout_transform_1 14.62 0.01
1 cpu0
9bd937910d443787
float32[1, 32, 7, 7, 16],
float32[1, 256, 7, 7, 2] NCHW16c NCHW2c
fused_nn_softmax 7.80 0.00
1 cpu0
ca61e79ea24e53f0
float32[1,
1000], float32[1, 1000]
fused_layout_transform_nn_batch_flatten 1.41 0.00
1 cpu0
2db99463d18696a4
float32[1, 128, 1,
1, 16], float32[1, 2048] NCHW16c NCHW
----------
Sum 2,71,351.60 99.61
84
Total 2,72,418.15
1 cpu0
```
(2): [debug_executor]
```
Config for target=llvm -keys=cpu -link-params=0, workload=('dense_nopack.x86',
('TENSOR', (1, 2048), 'float32'), ('TENSOR', (1000, 2048), 'float32'), None,
'float32') is missing in ApplyGraphBest context. A fallback configuration is
used, which may bring great performance regression.
Config for target=llvm -keys=cpu -link-params=0, workload=('dense_pack.x86',
('TENSOR', (1, 2048), 'float32'), ('TENSOR', (1000, 2048), 'float32'), None,
'float32') is missing in ApplyGraphBest context. A fallback configuration is
used, which may bring great performance regression.
One or more operators have not been tuned. Please tune your model for better
performance. Use DEBUG logging level to see more details.
Name Duration
(us) Percent layout Count out_layout Device data_layout kernel_layout
Hash
Argument Shapes src_layout dst_layout weight_layout
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_14
5,68,263.92 48.76 1 NCHW8c cpu0 NCHW3c
OIHW3i8o 2f8575d36cac57f0
float32[1, 1, 224, 224, 3], float32[8, 1, 7, 7, 3, 8], float32[1, 8, 1,
1, 8], float32[1, 8, 112, 112, 8]
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_add_nn_relu_2
1,75,988.78 15.10 1 NCHW32c cpu0 NCHW16c
OIHW16i32o 18ea4e7c768c292e float32[1, 4, 56,
56, 16], float32[8, 4, 1, 1, 16, 32], float32[1, 8, 56, 56, 32], float32[1, 8,
1, 1, 32], float32[1, 8, 56, 56, 32]
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_10
82,241.79 7.06 2 NCHW16c cpu0 NCHW16c
OIHW16i16o b2d690588ecaac96
float32[1, 4, 56, 56, 16], float32[4, 4, 3, 3, 16, 16], float32[1, 4,
1, 1, 16], float32[1, 4, 56, 56, 16]
tvmgen_default_fused_nn_contrib_conv2d_NCHWc
67,905.70 5.83 1 NCHW32c cpu0 NCHW8c
OIHW8i32o 7ff40af88acd710e
float32[1, 8, 56, 56, 8], float32[8, 8, 1, 1,
8, 32], float32[1, 8, 56, 56, 32]
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_3
39,639.91 3.40 5 NCHW16c cpu0 NCHW64c
OIHW64i16o 5c16c122a657ba21
float32[1, 4, 14, 14, 64], float32[16, 4, 3, 3, 64, 16], float32[1, 16, 1,
1, 16], float32[1, 16, 14, 14, 16]
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_7
31,242.14 2.68 4 NCHW8c cpu0 NCHW16c
OIHW16i8o f2c6de1cbe5c0ddb
float32[1, 8, 28, 28, 16], float32[16, 8, 3, 3, 16, 8], float32[1, 16,
1, 1, 8], float32[1, 16, 28, 28, 8]
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_4
29,317.11 2.52 2 NCHW32c cpu0 NCHW16c
OIHW16i32o 667036afd5deee1b
float32[1, 4, 56, 56, 16], float32[8, 4, 1, 1, 16, 32], float32[1, 8, 56,
56, 32], float32[1, 8, 56, 56, 32]
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu
23,174.76 1.99 3 NCHW8c cpu0 NCHW2c
OIHW2i8o cb108aaf00eff9e2
float32[1, 256, 7, 7, 2], float32[64, 256, 3, 3, 2, 8], float32[1, 64,
1, 1, 8], float32[1, 64, 7, 7, 8]
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_4
18,815.10 1.61 5 NCHW8c cpu0 NCHW1024c
OIHW1024i8o e4cba4831bd46d2c
float32[1, 1, 14, 14, 1024], float32[32, 1, 1, 1, 1024, 8], float32[1, 32,
1, 1, 8], float32[1, 32, 14, 14, 8]
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_1
14,143.22 1.21 4 NCHW16c cpu0 NCHW16c
OIHW16i16o 84bec82add215ebe
float32[1, 16, 14, 14, 16], float32[64, 16, 1, 1, 16, 16], float32[1, 64, 14,
14, 16], float32[1, 64, 14, 14, 16]
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_8
10,807.49 0.93 3 NCHW16c cpu0 NCHW16c
OIHW16i16o d930aa7bf46c34e1
float32[1, 32, 28, 28, 16], float32[8, 32, 1, 1, 16, 16], float32[1, 8,
1, 1, 16], float32[1, 8, 28, 28, 16]
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_3
10,635.87 0.91 3 NCHW16c cpu0 NCHW8c
OIHW8i16o 6beba43d92784786
float32[1, 16, 28, 28, 8], float32[32, 16, 1, 1, 8, 16], float32[1, 32, 28,
28, 16], float32[1, 32, 28, 28, 16]
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_13
8,887.56 0.76 1 NCHW32c cpu0 NCHW8c
OIHW8i32o ce29dd2da9289ac4
float32[1, 8, 56, 56, 8], float32[2, 8, 1, 1, 8, 32], float32[1, 2, 1,
1, 32], float32[1, 2, 56, 56, 32]
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_11
8,865.53 0.76 2 NCHW16c cpu0 NCHW32c
OIHW32i16o 6e49d3c836077ac7
float32[1, 8, 56, 56, 32], float32[4, 8, 1, 1, 32, 16], float32[1, 4,
1, 1, 16], float32[1, 4, 56, 56, 16]
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_5
8,017.28 0.69 1 NCHW8c cpu0 NCHW16c
OIHW16i8o 7baee5c8a4d8e4ab
float32[1, 16, 14, 14, 16], float32[32, 16, 3, 3, 16, 8], float32[1, 32,
1, 1, 8], float32[1, 32, 14, 14, 8]
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_12
7,585.56 0.65 1 NCHW16c cpu0 NCHW32c
OIHW32i16o 25fd1c3d9d4e561e
float32[1, 2, 56, 56, 32], float32[4, 2, 3, 3, 32, 16], float32[1, 4,
1, 1, 16], float32[1, 4, 56, 56, 16]
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_2
7,442.40 0.64 1 NCHW16c cpu0 NCHW16c
OIHW16i16o b6e66601adaeb1e3
float32[1, 32, 28, 28, 16], float32[64, 32, 1, 1,
16, 16], float32[1, 64, 14, 14, 16]
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_1
7,293.48 0.63 2 NCHW8c cpu0 NCHW2048c
OIHW2048i8o af5e7bf563de2757
float32[1, 1, 7, 7, 2048], float32[64, 1, 1, 1, 2048, 8], float32[1,
64, 1, 1, 8], float32[1, 64, 7, 7, 8]
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_3
7,140.03 0.61 1 NCHW8c cpu0 NCHW1024c
OIHW1024i8o 493c374dd5e37c2b
float32[1, 1, 14, 14, 1024], float32[256, 1, 1,
1, 1024, 8], float32[1, 256, 7, 7, 8]
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_1
7,041.60 0.60 1 NCHW16c cpu0 NCHW32c
OIHW32i16o 5e7a95757d65e24e
float32[1, 8, 56, 56, 32], float32[32, 8, 1, 1,
32, 16], float32[1, 32, 28, 28, 16]
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add
6,836.18 0.59 2 NCHW16c cpu0 NCHW4c
OIHW4i16o d0d1536228842867
float32[1, 128, 7, 7, 4], float32[128, 128, 1, 1, 4, 16], float32[1, 128, 7,
7, 16], float32[1, 128, 7, 7, 16]
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_2
3,727.41 0.32 1 NCHW16c cpu0 NCHW8c
OIHW8i16o c3c48546ccd1c8e4
float32[1, 32, 14, 14, 8], float32[64, 32, 1, 1, 8, 16], float32[1, 64, 14,
14, 16], float32[1, 64, 14, 14, 16]
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_add_nn_relu_1
3,596.31 0.31 1 NCHW16c cpu0 NCHW8c
OIHW8i16o faa415ce8e443d42 float32[1, 16, 28, 28,
8], float32[32, 16, 1, 1, 8, 16], float32[1, 32, 28, 28, 16], float32[1, 32, 1,
1, 16], float32[1, 32, 28, 28, 16]
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_multiply_add_nn_relu
3,468.59 0.30 1 NCHW16c cpu0 NCHW4c
OIHW4i16o a3a86603f87a1daa float32[1, 128, 7, 7, 4], float32[128, 128, 1, 1,
4, 16], float32[1, 128, 7, 7, 16], float32[1, 128, 1, 1, 16], float32[1, 128,
1, 1, 16], float32[1, 128, 7, 7, 16]
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_add_nn_relu
3,440.23 0.30 1 NCHW16c cpu0 NCHW16c
OIHW16i16o 237b36f60eadc660 float32[1, 16, 14, 14,
16], float32[64, 16, 1, 1, 16, 16], float32[1, 64, 14, 14, 16], float32[1, 64,
1, 1, 16], float32[1, 64, 14, 14, 16]
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_9
3,144.19 0.27 1 NCHW16c cpu0 NCHW32c
OIHW32i16o 39975a03990f0ed6
float32[1, 8, 56, 56, 32], float32[8, 8, 1, 1, 32, 16], float32[1, 8,
1, 1, 16], float32[1, 8, 28, 28, 16]
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_2
1,997.84 0.17 1 NCHW16c cpu0 NCHW16c
OIHW16i16o 8d07031ff51d0737
float32[1, 64, 14, 14, 16], float32[32, 64, 1, 1, 16, 16], float32[1, 32,
1, 1, 16], float32[1, 32, 7, 7, 16]
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_6
1,783.56 0.15 1 NCHW16c cpu0 NCHW16c
OIHW16i16o 8ec1781e87f7f62e
float32[1, 32, 28, 28, 16], float32[16, 32, 1, 1, 16, 16], float32[1, 16, 1,
1, 16], float32[1, 16, 14, 14, 16]
tvmgen_default_fused_add_nn_relu_1
473.00 0.04 2 cpu0
f6724216088f2bf7
float32[1, 8, 56, 56, 32], float32[1, 8, 1, 1, 32],
float32[1, 8, 56, 56, 32]
tvmgen_default_fused_add_nn_relu
338.92 0.03 3 cpu0
848825acfc73218b
float32[1, 32, 28, 28, 16], float32[1, 32, 1, 1, 16],
float32[1, 32, 28, 28, 16]
tvmgen_default_fused_add_nn_relu_layout_transform_1
286.62 0.02 5 cpu0
7590737f314ee1d9
float32[1, 64, 14, 14, 16], float32[1, 64, 1, 1, 16],
float32[1, 1, 14, 14, 1024] NCHW16c NCHW1024c
tvmgen_default_fused_nn_contrib_dense_pack_add
265.74 0.02 1 cpu0
ced18cccebfa2ada
float32[1, 2048], float32[125, 2048, 8], float32[1,
1000], float32[1, 1000] NC8n
tvmgen_default_fused_nn_max_pool2d_add_nn_relu
251.56 0.02 NCHW8c 1 cpu0
4883943910905d24
float32[1, 8, 112, 112, 8], float32[1, 8, 1, 1, 8],
float32[1, 8, 56, 56, 8]
tvmgen_default_fused_layout_transform_3
132.62 0.01 5 cpu0
0693edb3d97dc77f
float32[1, 32, 14, 14, 8],
float32[1, 4, 14, 14, 64] NCHW8c NCHW64c
tvmgen_default_fused_nn_global_avg_pool2d
69.42 0.01 NCHW16c 1 cpu0
f18307e2786f4cb3
float32[1, 128, 7, 7, 16],
float32[1, 128, 1, 1, 16]
tvmgen_default_fused_layout_transform_4
60.92 0.01 1 cpu0
aad3e266e27c5054
float32[1, 256, 7, 7, 8],
float32[1, 128, 7, 7, 16] NCHW8c NCHW16c
tvmgen_default_fused_layout_transform_5
58.94 0.01 1 cpu0
6dda5720a553f260
float32[1, 64, 14, 14, 16],
float32[1, 1, 14, 14, 1024] NCHW16c NCHW1024c
tvmgen_default_fused_add_layout_transform
56.01 0.00 1 cpu0
69355d3cc810f874
float32[1, 3, 224, 224], float32[3, 1, 1],
float32[1, 1, 224, 224, 3] NCHW NCHW3c
tvmgen_default_fused_add_nn_relu_layout_transform
54.40 0.00 2 cpu0
468080b095af509a
float32[1, 128, 7, 7, 16], float32[1, 128, 1, 1, 16],
float32[1, 1, 7, 7, 2048] NCHW16c NCHW2048c
tvmgen_default_fused_layout_transform_1
42.67 0.00 2 cpu0
69f132fa7e1d6749
float32[1, 64, 7, 7, 8],
float32[1, 256, 7, 7, 2] NCHW8c NCHW2c
tvmgen_default_fused_layout_transform
33.90 0.00 3 cpu0
bd0b0c2ae84f7e09
float32[1, 64, 7, 7, 8],
float32[1, 128, 7, 7, 4] NCHW8c NCHW4c
tvmgen_default_fused_layout_transform_2
19.34 0.00 1 cpu0
9bd937910d443787
float32[1, 32, 7, 7, 16],
float32[1, 256, 7, 7, 2] NCHW16c NCHW2c
tvmgen_default_fused_nn_softmax
7.03 0.00 1 cpu0
ca61e79ea24e53f0
float32[1,
1000], float32[1, 1000]
tvmgen_default_fused_layout_transform_nn_batch_flatten
0.96 0.00 1 cpu0
2db99463d18696a4
float32[1, 128, 1,
1, 16], float32[1, 2048] NCHW16c NCHW
----------
Sum
11,64,595.59 99.94 84
Total
11,65,326.68 1 cpu0
```
(3) [benchmark]
```
Config for target=llvm -keys=cpu -link-params=0, workload=('dense_nopack.x86',
('TENSOR', (1, 2048), 'float32'), ('TENSOR', (1000, 2048), 'float32'), None,
'float32') is missing in ApplyGraphBest context. A fallback configuration is
used, which may bring great performance regression.
Config for target=llvm -keys=cpu -link-params=0, workload=('dense_pack.x86',
('TENSOR', (1, 2048), 'float32'), ('TENSOR', (1000, 2048), 'float32'), None,
'float32') is missing in ApplyGraphBest context. A fallback configuration is
used, which may bring great performance regression.
One or more operators have not been tuned. Please tune your model for better
performance. Use DEBUG logging level to see more details.
Evaluate inference time cost...
Execution time summary:
mean (ms) median (ms) max (ms) min (ms) std (ms)
269.9458 270.0297 270.0697 269.7381 0.1478
```
---
[Visit
Topic](https://discuss.tvm.apache.org/t/difference-in-profiler-outputs/11255/3)
to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click
here](https://discuss.tvm.apache.org/email/unsubscribe/3be1cc7d891dd0827cc7db9c0c190811ef820a2f81f495b58a315fd38b305cca).