fhieber opened a new issue #20636: URL: https://github.com/apache/incubator-mxnet/issues/20636
## Description We observe a significant reduction in [Sockeye](https://github.com/awslabs/sockeye) inference speed with a recent build of MXNet 2.x (master branch). Compared to 1.x versions of MXNet, GPU translation with MXNet 2.x is **~2x slower**. For MXNet 2.x, we migrated Sockeye to the Gluon 2.0 interface and adopted the new Numpy namespaces. Otherwise, code is equivalent to master with the same level of hybridization (`static_alloc=True`) in both branches. The pull request/branch can be found here: https://github.com/awslabs/sockeye/pull/953. The runs below use half-precision and run on a p3.2xlarge. Outputs are equal. ### p3.2xlarge instance #### batch size 64 `mxnet-cu112 2.0.0b20211001`: ``` [INFO:__main__] Processed 3003 lines. Total time: 37.2888, sec/sent: 0.0124, sent/sec: 80.5336 ``` `mxnet-cu112 1.7`: ``` [INFO:__main__] Processed 3003 lines. Total time: 20.2805, sec/sent: 0.0068, sent/sec: 148.0735 ``` #### batch size 1 `mxnet-cu112 2.0.0b20211001`: ``` [INFO:__main__] Processed 3003 lines. Total time: 858.3818, sec/sent: 0.2858, sent/sec: 3.4984 ``` `mxnet-cu112 1.7`: ``` [INFO:__main__] Processed 3003 lines. Total time: 302.0189, sec/sent: 0.1006, sent/sec: 9.9431 ``` ### g4 instance ``` mx18/out.1.bpe.log:[2021-10-04:20:02:32:INFO:__main__:read_and_translate] Processed 3003 lines. Total time: 316.4692, sec/sent: 0.1054, sent/sec: 9.4891 mx18/out.64.bpe.log:[2021-10-04:20:03:10:INFO:__main__:read_and_translate] Processed 3003 lines. Total time: 31.8175, sec/sent: 0.0106, sent/sec: 94.3819 mx20/out.1.bpe.log:[2021-10-04:20:17:32:INFO:__main__:read_and_translate] Processed 3003 lines. Total time: 714.5509, sec/sent: 0.2379, sent/sec: 4.2026 mx20/out.64.bpe.log:[2021-10-04:20:18:26:INFO:__main__:read_and_translate] Processed 3003 lines. Total time: 46.4607, sec/sent: 0.0155, sent/sec: 64.6352 ``` ## To Reproduce - Download the Sockeye sample model - Run `translate.sh` with the `master` branch of Sockeye - Run `translate.sh` with the `mx2` branch of Sockeye ### Steps to reproduce (Paste the commands you ran that produced the error.) 1. wget https://github.com/awslabs/sockeye/releases/download/2.3.22/wmt14_en_de.tgz 2. tar -xvf wmt14_en_de.tgz 3. git clone https://github.com/awslabs/sockeye.git 4. pip install -r sockeye/requirements/requirements.gpu-cu112.txt` 5. `mv sockeye/sockeye wmt_14_en_de` 6. cd `wmt_14_en_de` 7. `bash translate.sh` [translate with master branch] 8. `git checkout mx2` 9. (Install nightly build of mx2: `pip uninstall mxnet-cu112 ; pip install --pre -f https://dist.mxnet.io/python 'mxnet-cu112'`) 10. `bash translate.sh` [translate with mx2 branch] ## What have you tried to solve it? - ## Environment - Cuda 11.2 (`conda install -c conda-forge nccl cudnn cudatoolkit==11.2`) - MXNet 1.8.post0 or MXNet 1.7 vs MXNet 2.x (`2.0.0b20211001`) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
