happylee524 opened a new issue, #21005:
URL: https://github.com/apache/incubator-mxnet/issues/21005

   ## Description
   (A clear and concise description of what the bug is.)
   I run a distributed training program using "launch.py" method,but after 
training task finish, the program cannot exit and return.
   
   
   
   ### Error Message
   This is the logging.
   
![运行日志](https://user-images.githubusercontent.com/23152071/164203345-aeec3754-4e33-4cd5-821e-60838456b622.png)
   
   This is my code
   
![代码](https://user-images.githubusercontent.com/23152071/164203646-713c624a-1321-4234-8395-77afdee339c5.png)
   
   
   ### Steps to reproduce
   (Paste the commands you ran that produced the error.)
   
   1. Start command:  python 
/usr/local/lib/python3.8/dist-packages/mxnet/tools/launch.py -n 2 -H 
/tmp/algorithm/Host --sync-dst-dir /tmp/algorithm/mnist_sync --launcher ssh 
"python /tmp/algorithm/image_classification.py --dataset mnist --model alexnet 
--epochs 3 --gpus 0,1"
   2. I run my code in two k8s pod with image 
mxnet/python:2.0.0beta1_gpu_cu110_py3 
   3. mxnet version: 2.0.0
   4. My code : image_classification.py  copy from mxnet github
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to