chrishkchris opened a new pull request #535: SINGA-490 Optimize performance of stochastic gradient descent (SGD) URL: https://github.com/apache/incubator-singa/pull/535 I have fused the small operations of momentum SGD so as to increase GPU computation efficiency and decrease latency. Moreover, I have added the Sync() function for better time profiling in resnet.py (wait for the previous cuda operations to be finished before start calculating the time). 1. This is the new result after improving the momentum SGD: ``` ubuntu@ip-172-31-39-137:~/incubator-singa/examples/autograd$ python3 mnist_cnn.py Starting Epoch 0: Training loss = 583.052124, training accuracy = 0.793690 Evaluation accuracy = 0.943409, Elapsed Time = 4.191409s Starting Epoch 1: Training loss = 229.894424, training accuracy = 0.923609 Evaluation accuracy = 0.961438, Elapsed Time = 4.170332s Starting Epoch 2: Training loss = 168.670303, training accuracy = 0.943937 Evaluation accuracy = 0.964744, Elapsed Time = 4.186504s Starting Epoch 3: Training loss = 133.865494, training accuracy = 0.955259 Evaluation accuracy = 0.978566, Elapsed Time = 4.188593s Starting Epoch 4: Training loss = 116.104378, training accuracy = 0.961730 Evaluation accuracy = 0.971554, Elapsed Time = 4.195830s Starting Epoch 5: Training loss = 101.295425, training accuracy = 0.966299 Evaluation accuracy = 0.974059, Elapsed Time = 4.191312s Starting Epoch 6: Training loss = 94.570869, training accuracy = 0.969684 Evaluation accuracy = 0.977464, Elapsed Time = 4.181115s Starting Epoch 7: Training loss = 85.930618, training accuracy = 0.970968 Evaluation accuracy = 0.984675, Elapsed Time = 4.182598s Starting Epoch 8: Training loss = 83.169617, training accuracy = 0.971768 Evaluation accuracy = 0.985076, Elapsed Time = 4.202356s Starting Epoch 9: Training loss = 77.906853, training accuracy = 0.973969 Evaluation accuracy = 0.982372, Elapsed Time = 4.191382s ubuntu@ip-172-31-39-137:~/incubator-singa/examples/autograd$ python3 resnet.py Start intialization............ 100%|███████████████████████████████████████████████████████████████████████| 100/100 [01:26<00:00, 1.14it/s] Throughput = 36.89267491263885 per second Total=0.8673808574676514, forward=0.2684857630729675, softmax=0.0027115750312805176, backward=0.5961835193634033, sgd=0.03734057664871216 ``` 2. This is the old result before improving the momentum SGD: ``` ubuntu@ip-172-31-39-137:~/incubator-singa/examples/autograd$ python3 mnist_cnn.py Starting Epoch 0: Training loss = 581.382263, training accuracy = 0.794974 Evaluation accuracy = 0.934495, Elapsed Time = 5.541576s Starting Epoch 1: Training loss = 233.281906, training accuracy = 0.920808 Evaluation accuracy = 0.953025, Elapsed Time = 5.492121s Starting Epoch 2: Training loss = 169.505447, training accuracy = 0.943503 Evaluation accuracy = 0.971454, Elapsed Time = 5.493372s Starting Epoch 3: Training loss = 136.643906, training accuracy = 0.954309 Evaluation accuracy = 0.975761, Elapsed Time = 5.513660s Starting Epoch 4: Training loss = 116.743042, training accuracy = 0.960963 Evaluation accuracy = 0.979968, Elapsed Time = 5.526858s Starting Epoch 5: Training loss = 103.864464, training accuracy = 0.965732 Evaluation accuracy = 0.979667, Elapsed Time = 5.513694s Starting Epoch 6: Training loss = 94.542282, training accuracy = 0.968550 Evaluation accuracy = 0.975461, Elapsed Time = 5.520474s Starting Epoch 7: Training loss = 87.548050, training accuracy = 0.971368 Evaluation accuracy = 0.980970, Elapsed Time = 5.535038s Starting Epoch 8: Training loss = 83.162071, training accuracy = 0.971485 Evaluation accuracy = 0.975661, Elapsed Time = 5.536836s Starting Epoch 9: Training loss = 78.447533, training accuracy = 0.974570 Evaluation accuracy = 0.982772, Elapsed Time = 5.547574s ubuntu@ip-172-31-39-137:~/incubator-singa/examples/autograd$ python3 resnet.py Start intialization............ 100%|███████████████████████████████████████████████████████████████████████| 100/100 [01:49<00:00, 1.11s/it] Throughput = 29.05542749993395 per second Total=1.101343286037445, forward=0.270987823009491, softmax=0.0029543495178222657, backward=0.8274011135101318, sgd=0.3130151700973511 ``` From above two sets of results (1) and (2), we can see that the new momentum SGD is much faster after fusing the small operations.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services