chrishkchris commented on a change in pull request #728:
URL: https://github.com/apache/singa/pull/728#discussion_r437997370
##########
File path: python/singa/device.py
##########
@@ -113,10 +113,8 @@ def create_cuda_gpu(set_default=False):
a swig converted CudaGPU device.
'''
assert singa.USE_CUDA, 'SINGA has not been compiled with CUDA enabled.'
- devices = singa.Platform.CreateCudaGPUs(1)
- if set_default:
- set_default_device(devices[0])
- return devices[0]
+ device = create_cuda_gpu_on(0, set_default)
Review comment:
I added the default_gpu_device
Moreover, I also removed some unused code `set_default_device(device)`,
which was used to set default device for graph operation. It was used to
prevent the buffering of `to_device` operation, which repeats the `to_device`
every iteration. However, since the new layer API do the initialization
separately, the `set_default_device(device)` is no longer necessary.
I have tested the (i) cnn mnist, amd (ii) resnet cifar10, the training is ok
(i) CNN MNIST
```
root@56142bc34887:~/dcsysh/singa/examples/cnn# mpiexec -np 8 python3
train_mpi.py cnn mnist -l 0.04
Starting Epoch 0:
Training loss = 834.012695, training accuracy = 0.701389
Evaluation accuracy = 0.921258, Elapsed Time = 0.849777s
Starting Epoch 1:
Training loss = 262.023132, training accuracy = 0.910857
Evaluation accuracy = 0.963405, Elapsed Time = 0.775921s
Starting Epoch 2:
Training loss = 175.929840, training accuracy = 0.941256
Evaluation accuracy = 0.962068, Elapsed Time = 0.784108s
Starting Epoch 3:
Training loss = 138.893051, training accuracy = 0.953609
Evaluation accuracy = 0.971937, Elapsed Time = 0.796294s
Starting Epoch 4:
Training loss = 119.912170, training accuracy = 0.959602
Evaluation accuracy = 0.974609, Elapsed Time = 0.788538s
Starting Epoch 5:
Training loss = 106.647232, training accuracy = 0.964159
Evaluation accuracy = 0.976460, Elapsed Time = 0.791224s
Starting Epoch 6:
Training loss = 100.382210, training accuracy = 0.966763
Evaluation accuracy = 0.978516, Elapsed Time = 0.804543s
Starting Epoch 7:
Training loss = 86.349213, training accuracy = 0.971137
Evaluation accuracy = 0.976562, Elapsed Time = 0.798764s
Starting Epoch 8:
Training loss = 82.797058, training accuracy = 0.972055
Evaluation accuracy = 0.982833, Elapsed Time = 0.746563s
Starting Epoch 9:
Training loss = 77.220978, training accuracy = 0.974376
Evaluation accuracy = 0.977796, Elapsed Time = 0.738852s
```
(ii) RESNET CIFAR10
```
root@56142bc34887:~/dcsysh/singa/examples/cnn# mpiexec -np 8 python3
train_mpi.py resnet cifar10 -b 32 -l 0.04
Starting Epoch 0:
Training loss = 3952.119385, training accuracy = 0.216567
Evaluation accuracy = 0.342648, Elapsed Time = 52.988312s
Starting Epoch 1:
Training loss = 2519.932373, training accuracy = 0.399439
Evaluation accuracy = 0.467849, Elapsed Time = 52.414376s
Starting Epoch 2:
Training loss = 2165.224854, training accuracy = 0.497937
Evaluation accuracy = 0.560998, Elapsed Time = 52.504168s
Starting Epoch 3:
Training loss = 1884.613525, training accuracy = 0.565605
Evaluation accuracy = 0.596755, Elapsed Time = 52.721652s
Starting Epoch 4:
Training loss = 1682.462158, training accuracy = 0.617188
Evaluation accuracy = 0.643429, Elapsed Time = 52.857880s
Starting Epoch 5:
Training loss = 1514.762329, training accuracy = 0.654888
Evaluation accuracy = 0.689002, Elapsed Time = 52.902534s
Starting Epoch 6:
Training loss = 1372.399536, training accuracy = 0.689283
Evaluation accuracy = 0.708434, Elapsed Time = 53.047120s
Starting Epoch 7:
Training loss = 1220.632446, training accuracy = 0.726362
Evaluation accuracy = 0.743389, Elapsed Time = 53.035040s
Starting Epoch 8:
Training loss = 1110.090942, training accuracy = 0.751182
Evaluation accuracy = 0.761919, Elapsed Time = 53.105462s
Starting Epoch 9:
Training loss = 1006.458618, training accuracy = 0.775160
Evaluation accuracy = 0.772436, Elapsed Time = 53.146286s
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]