fhieber opened a new issue #20636:
URL: https://github.com/apache/incubator-mxnet/issues/20636


   ## Description
   We observe a significant reduction in 
[Sockeye](https://github.com/awslabs/sockeye) inference speed with a recent 
build of MXNet 2.x (master branch). Compared to 1.x versions of MXNet, GPU 
translation with MXNet 2.x is **~2x slower**.
   
   For MXNet 2.x, we migrated Sockeye to the Gluon 2.0 interface and adopted 
the new Numpy namespaces. Otherwise, code is equivalent to master with the same 
level of hybridization (`static_alloc=True`) in both branches. The pull 
request/branch can be found here: https://github.com/awslabs/sockeye/pull/953.
   
   The runs below use half-precision and run on a p3.2xlarge. Outputs are equal.
   
   
   ### p3.2xlarge instance
   #### batch size 64
   `mxnet-cu112 2.0.0b20211001`:
   ```
   [INFO:__main__] Processed 3003 lines. Total time: 37.2888, sec/sent: 0.0124, 
sent/sec: 80.5336
   ```
   `mxnet-cu112 1.7`:
   ```
   [INFO:__main__] Processed 3003 lines. Total time: 20.2805, sec/sent: 0.0068, 
sent/sec: 148.0735
   ```
   
   #### batch size 1
   `mxnet-cu112 2.0.0b20211001`:
   ```
   [INFO:__main__] Processed 3003 lines. Total time: 858.3818, sec/sent: 
0.2858, sent/sec: 3.4984
   ```
   `mxnet-cu112 1.7`:
   ```
   [INFO:__main__] Processed 3003 lines. Total time: 302.0189, sec/sent: 
0.1006, sent/sec: 9.9431
   ```
   
   ### g4 instance
   ```
   mx18/out.1.bpe.log:[2021-10-04:20:02:32:INFO:__main__:read_and_translate] 
Processed 3003 lines. Total time: 316.4692, sec/sent: 0.1054, sent/sec: 9.4891
   mx18/out.64.bpe.log:[2021-10-04:20:03:10:INFO:__main__:read_and_translate] 
Processed 3003 lines. Total time: 31.8175, sec/sent: 0.0106, sent/sec: 94.3819
   mx20/out.1.bpe.log:[2021-10-04:20:17:32:INFO:__main__:read_and_translate] 
Processed 3003 lines. Total time: 714.5509, sec/sent: 0.2379, sent/sec: 4.2026
   mx20/out.64.bpe.log:[2021-10-04:20:18:26:INFO:__main__:read_and_translate] 
Processed 3003 lines. Total time: 46.4607, sec/sent: 0.0155, sent/sec: 64.6352
   ```
   
   ## To Reproduce
   - Download the Sockeye sample model
   - Run `translate.sh` with the `master` branch of Sockeye
   - Run `translate.sh` with the `mx2` branch of Sockeye
   
   ### Steps to reproduce
   (Paste the commands you ran that produced the error.)
   1. wget 
https://github.com/awslabs/sockeye/releases/download/2.3.22/wmt14_en_de.tgz
   2. tar -xvf wmt14_en_de.tgz
   3. git clone https://github.com/awslabs/sockeye.git
   4. pip install -r sockeye/requirements/requirements.gpu-cu112.txt`
   5. `mv sockeye/sockeye wmt_14_en_de`
   6. cd `wmt_14_en_de`
   7. `bash translate.sh` [translate with master branch]
   8. `git checkout mx2`
   9. (Install nightly build of mx2: `pip uninstall mxnet-cu112 ; pip install 
--pre -f https://dist.mxnet.io/python 'mxnet-cu112'`)
   10. `bash translate.sh` [translate with mx2 branch]
   
   ## What have you tried to solve it?
   -
   
   ## Environment
   - Cuda 11.2 (`conda install -c conda-forge nccl cudnn cudatoolkit==11.2`)
   - MXNet 1.8.post0 or MXNet 1.7 vs MXNet 2.x (`2.0.0b20211001`)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to