chrishkchris commented on a change in pull request #468: Distributted module URL: https://github.com/apache/incubator-singa/pull/468#discussion_r317937233
########## File path: examples/autograd/mnist_dist.py ########## @@ -0,0 +1,251 @@ +# Review comment: Has modified mnist_cnn.py and mnist_dist.py: 1. the model construction, data preprocessing and training code are in mnist_cnn.py 2. mnist_dist.py import mnist_cnn functions and passes the dist opt into train_mnist_cnn() to conduct dist training (needs MPI). 3. the download_mnist.py is added at the same dir, which is used to download the dataset before the training. It is separated out from the training code to prevent different process downloading data at the same time. Here is the log of running the code: ``` ubuntu@ip-172-31-21-218:~/incubator-singa/examples/autograd$ python3 download_mnist.py Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz ubuntu@ip-172-31-21-218:~/incubator-singa/examples/autograd$ python3 mnist_cnn.py Starting Epoch 0: Training loss = 586.417175, training accuracy = 0.792840 Evaluation accuracy = 0.940104, Elapsed Time = 5.638494s Starting Epoch 1: Training loss = 235.360107, training accuracy = 0.922292 Evaluation accuracy = 0.955429, Elapsed Time = 5.563161s Starting Epoch 2: Training loss = 170.056442, training accuracy = 0.943270 Evaluation accuracy = 0.963942, Elapsed Time = 5.579273s Starting Epoch 3: Training loss = 135.514252, training accuracy = 0.954476 Evaluation accuracy = 0.967248, Elapsed Time = 5.562721s Starting Epoch 4: Training loss = 116.975700, training accuracy = 0.960812 Evaluation accuracy = 0.978265, Elapsed Time = 5.583826s Starting Epoch 5: Training loss = 103.893723, training accuracy = 0.965065 Evaluation accuracy = 0.982372, Elapsed Time = 5.585272s Starting Epoch 6: Training loss = 95.044586, training accuracy = 0.967266 Evaluation accuracy = 0.981671, Elapsed Time = 5.580424s Starting Epoch 7: Training loss = 89.102654, training accuracy = 0.971118 Evaluation accuracy = 0.980268, Elapsed Time = 5.583646s Starting Epoch 8: Training loss = 80.395744, training accuracy = 0.972969 Evaluation accuracy = 0.983273, Elapsed Time = 5.600029s Starting Epoch 9: Training loss = 78.355209, training accuracy = 0.973119 Evaluation accuracy = 0.979267, Elapsed Time = 5.587740s ubuntu@ip-172-31-21-218:~/incubator-singa/examples/autograd$ /home/ubuntu/mpich-3.3/build/bin/mpiexec --hostfile host_file python3 mnist_dist.py Starting Epoch 0: Training loss = 781.167480, training accuracy = 0.719017 Evaluation accuracy = 0.918586, Elapsed Time = 1.255623s Starting Epoch 1: Training loss = 259.223297, training accuracy = 0.912276 Evaluation accuracy = 0.950863, Elapsed Time = 1.216926s Starting Epoch 2: Training loss = 179.333084, training accuracy = 0.940605 Evaluation accuracy = 0.968030, Elapsed Time = 1.206751s Starting Epoch 3: Training loss = 137.840988, training accuracy = 0.954243 Evaluation accuracy = 0.975946, Elapsed Time = 1.202503s Starting Epoch 4: Training loss = 119.743629, training accuracy = 0.959836 Evaluation accuracy = 0.973581, Elapsed Time = 1.208274s Starting Epoch 5: Training loss = 102.545876, training accuracy = 0.965595 Evaluation accuracy = 0.980572, Elapsed Time = 1.205539s Starting Epoch 6: Training loss = 93.249054, training accuracy = 0.969401 Evaluation accuracy = 0.978207, Elapsed Time = 1.203708s Starting Epoch 7: Training loss = 84.655556, training accuracy = 0.971104 Evaluation accuracy = 0.980777, Elapsed Time = 1.206410s Starting Epoch 8: Training loss = 77.996643, training accuracy = 0.973691 Evaluation accuracy = 0.985609, Elapsed Time = 1.207295s Starting Epoch 9: Training loss = 75.888077, training accuracy = 0.974442 Evaluation accuracy = 0.982319, Elapsed Time = 1.203693s ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services