meanmee commented on issue #12363: distributed training notebook tests
URL: 
https://github.com/apache/incubator-mxnet/issues/12363#issuecomment-416847397
 
 
   first of all, check 
out:https://github.com/apache/incubator-mxnet/tree/master/example/distributed_training
   install anaconda on all your machines
   run "pip install mxnet-cu80==1.2.1" on your all machines (or pip install 
mxnet-cu90 depends on your machine's env)
   make all your machines sshable without input password between each other 
   machine A to machine B
   cd  ~/.ssh
   ssh-keygen  -t  rsa
   two files are generated:  id_rsa is the secret key;and  id_rsa.pub is the 
public key
   in Machine B: vim ~/.ssh/authorized_keys, copy the contents in machine 
A:~/.ssh/id_rsa.pub here
   do the same things to make machine B to machine A sshable without password
   also, machine A to Machine A, machine B to machine B is needed
   then run 
    python 
/home/xiaomin.wu/anaconda2/lib/python2.7/site-packages/mxnet/tools/launch.py -n 
2 -s 2 -H hosts --sync-dst-dir /home/xiaomin.wu/cifar10_dist --launcher ssh  
"/home/xiaomin.wu/anaconda2/bin/python cifar10_dist.py" 
   here we use /home/xiaomin.wu/anaconda2/bin/python instead of python, because 
if we just ues python here the machines may use /usr/bin/pythob, which will get 
you crazy.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to