meanmee commented on issue #12363: distributed training notebook tests URL: https://github.com/apache/incubator-mxnet/issues/12363#issuecomment-416847397 first of all, check out:https://github.com/apache/incubator-mxnet/tree/master/example/distributed_training install anaconda on all your machines run "pip install mxnet-cu80==1.2.1" on your all machines (or pip install mxnet-cu90 depends on your machine's env) make all your machines sshable without input password between each other machine A to machine B cd ~/.ssh ssh-keygen -t rsa two files are generated: id_rsa is the secret key;and id_rsa.pub is the public key in Machine B: vim ~/.ssh/authorized_keys, copy the contents in machine A:~/.ssh/id_rsa.pub here do the same things to make machine B to machine A sshable without password also, machine A to Machine A, machine B to machine B is needed then run python /home/xiaomin.wu/anaconda2/lib/python2.7/site-packages/mxnet/tools/launch.py -n 2 -s 2 -H hosts --sync-dst-dir /home/xiaomin.wu/cifar10_dist --launcher ssh "/home/xiaomin.wu/anaconda2/bin/python cifar10_dist.py" here we use /home/xiaomin.wu/anaconda2/bin/python instead of python, because if we just ues python here the machines may use /usr/bin/pythob, which will get you crazy.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services