The use case is implementing effectively a circular buffer using
`concat`+`slice`. Here is the code:
```
from mxnet import profiler
profiler.set_config(profile_all=True, aggregate_stats=True,
filename='/home/ec2-user/src/mkl_slice_op_profile.json')
class TestBlock(gluon.HybridBlock):
def __init__(self):
super(TestBlock, self).__init__()
with self.name_scope():
self.conv = gluon.nn.Conv2D(512, kernel_size=(1, 3), dilation=512)
def hybrid_forward(self, F, x):
out = self.conv(x)
x = F.concat(x, out, dim=3)
x = F.slice_axis(x, axis=3, begin=-1025, end=None)
# x = F.slice(x, begin=(None, None, None, -1025), end=(None, None,
None, None))
return x
x = nd.random.uniform(shape=(32, 512, 1, 1025))
net = TestBlock()
net.initialize()
net.hybridize(static_alloc=True, static_shape=True)
x = net(x)
profiler.set_state('run')
for _ in range(100):
x = net(x)
nd.waitall()
profiler.set_state('stop')
profiler.dump()
print(profiler.dumps(reset=True))
exit(0)
```
And here are the interesting profiling results.
1. Profile with mxnet package and `slice_axis` operator (in `hybrid_forward()`,
uncomment `slice` and comment `slice_axis`) (**no MKL**)
```
operator
=================
Name Total Count Time (ms) Min Time (ms)
Max Time (ms) Avg Time (ms)
---- ----------- --------- -------------
------------- -------------
slice_axis 200 4048.8311 20.1010
20.3790 20.2442
Concat 200 17641.7461 88.0750
89.5890 88.2087
Convolution 200 2944.2839 14.5890
14.8890 14.7214
DeleteVariable 206 517.0800 0.0030
2.6670 2.5101
```
2. Profile with mxnet package and `slice` operator (**no MKL**) (**Consistently
performs ~2% better than `slice_axis`!!**)
```
operator
=================
Name Total Count Time (ms) Min Time (ms)
Max Time (ms) Avg Time (ms)
---- ----------- --------- -------------
------------- -------------
slice 200 3938.1279 19.5190
19.9520 19.6906
Concat 200 17636.0566 88.0600
88.7120 88.1803
Convolution 200 2945.0759 14.5760
14.8420 14.7254
DeleteVariable 206 521.2870 0.0030
2.6960 2.5305
```
3. Profile with mxnet-mkl package and `slice_axis` operator (**with MKLDNN**)
```
operator
=================
Name Total Count Time (ms) Min Time (ms)
Max Time (ms) Avg Time (ms)
---- ----------- --------- -------------
------------- -------------
Reorder 202 2.9610 0.0000
1.3190 0.0147
slice_axis 200 4979.5488 24.6100
26.1240 24.8977
Concat 200 881.7350 4.3000
4.5370 4.4087
Convolution 200 1231.0720 5.9080
11.6130 6.1554
DeleteVariable 408 982.9400 0.0030
2.8100 2.4092
```
4. Profile with mxnet-mkl package and `slice` operator (**with MKLDNN**)
```
operator
=================
Name Total Count Time (ms) Min Time (ms)
Max Time (ms) Avg Time (ms)
---- ----------- --------- -------------
------------- -------------
Reorder 202 2.8510 0.0000
1.2710 0.0141
slice 200 5012.6240 24.8500
27.0280 25.0631
Concat 200 880.1710 4.2900
4.5270 4.4009
Convolution 200 1252.7841 5.9060
11.7800 6.2639
DeleteVariable 408 970.0030 0.0040
2.8370 2.3775
```
[ Full content available at:
https://github.com/apache/incubator-mxnet/issues/12303 ]
This message was relayed via gitbox.apache.org for [email protected]