chrishkchris commented on issue #535: SINGA-490 Optimize performance of stochastic gradient descent (SGD) URL: https://github.com/apache/incubator-singa/pull/535#issuecomment-532944885 Finally, I test the distributed training in AWS p2.x8large, after adding the Sync() in the SGD loop of resnet.py and resnet_dist.py. The speed up of using 8 GPUs is now 7.21x, but this is compared without real data feeding. See the following throughput comparison in resnet.py and resnet_dist.py: ``` ubuntu@ip-172-31-28-231:~/incubator-singa/examples/autograd$ python3 resnet.py Start intialization............ 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [01:23<00:00, 1.19it/s] Throughput = 38.13589358185999 per second Total=0.8391045022010803, forward=0.26401839971542357, softmax=0.0020227289199829103, backward=0.5730633735656739, sgd=0.016838366985321044 ubuntu@ip-172-31-28-231:~/incubator-singa/examples/autograd$ /home/ubuntu/mpich-3.3/build/bin/mpiexec --hostfile host_file python3 resnet_dist.py Start intialization........... 100%|██████████| 100/100 [01:33<00:00, 1.08it/s] Throughput = 274.9947180123401 per second Total=0.9309269714355469, forward=0.2690380573272705, softmax=0.0021610450744628906, backward=0.6597278690338135, sgd=0.10374969005584717 ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services