[MXNet Forum] Performance regression in 1.4

kenjli via MXNet Forum Mon, 21 Dec 2020 18:52:26 -0800


Hi all,


I have two containers:
1. One running Python 2 and MXNet 1.1 
2. An updated container running Python 3 and MXNet 1.4

I have observed some significant performance regressions in the py3-MXNet 1.4.1 
container, which is built with MKLDNN enabled.

I am using code at this repo as a 'minimal reproducible example': 
https://github.com/opringle/multivariate_time_series_forecasting

I used to profiler in each version to capture the second training batch for 
both containers, in a manner like this: 

```
        i = 0
        for batch in train_iter:
            start_time = time.time()

            if i==1:
                profiler.set_state('run')

            module.forward(batch, is_train=True) 
            module.backward()  
            mx.nd.waitall()
            
            if i==1:
                profiler.set_state('stop')
                profiler.dump()
```

This is the profiler output when sorted by total op time for the py2-1.1 
container:

![image|690x130](upload://sQi3SUoR5SPd2R8HPG9iD7VPSvq.png) 

Same for the Py3-1.4.1 container: (uploading as a reply due to new-user 
restriction)


Some ops like `backward_Convolution` are significantly slower. My machine CPU 
is a 6-core Intel i7.

Does anyone know if this operator specific, or know a method to determine if it 
is? Is this issue related to MKL-DNN somehow? 

Other context:

Due to how our code is currently structured in my org, it's quite difficult to 
upgrade to 1.5+. 
When I run the same example with the same containers on a machine with an Intel 
Xeon CPU (c5 instance on AWS), the opposite occurs: the py3-1.4.1 container is 
much faster per batch than the py2-1.1 container.





---
[Visit 
Topic](https://discuss.mxnet.apache.org/t/performance-regression-in-1-4/6790/1) 
or reply to this email to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.mxnet.apache.org/email/unsubscribe/04b1841b1f707aae318c5cdeb4115abfe96f15d058cd9b28f339435f5659754c).

[MXNet Forum] Performance regression in 1.4

Reply via email to