chrishkchris edited a comment on pull request #697:
URL: https://github.com/apache/singa/pull/697#issuecomment-637375715


   I am using this PR to train Xceptionnet in order to use the save_state 
function, but I encountered something strange:
   
   (i) The training and evaluation were both okay in 
https://github.com/apache/singa/pull/651
   ```
   (singa) dcsysh@panda7:~/singa/examples/autograd$ python3 train.py 
xceptionnet ci
   Starting Epoch 0:
   Training loss = 11198.645508, training accuracy = 0.214420
   Evaluation accuracy = 0.309000, Elapsed Time = 606.547117s
   Starting Epoch 1:
   Training loss = 6354.611328, training accuracy = 0.381020
   Evaluation accuracy = 0.457300, Elapsed Time = 612.817129s
   ```
   
   (ii) This time I think the training is okay, but something wrong in the 
evaluation
   ```
   root@e8a757397ca3:~/dcsysh/singa/examples/cnn# mpiexec -np 8 python3 
train_mpi.py xceptionnet cifar10 --bs 16 --lr 0.04 --epoch 30
   Starting Epoch 0:
   Training loss = 11614.897461, training accuracy = 0.131190
   Evaluation accuracy = 0.099860, Elapsed Time = 98.705291s
   Starting Epoch 1:
   Training loss = 6932.552246, training accuracy = 0.157552
   Evaluation accuracy = 0.099860, Elapsed Time = 98.400360s
   Starting Epoch 2:
   Training loss = 6565.343262, training accuracy = 0.195853
   Evaluation accuracy = 0.099960, Elapsed Time = 99.807898s
   Starting Epoch 3:
   Training loss = 6173.305176, training accuracy = 0.254467
   Evaluation accuracy = 0.099960, Elapsed Time = 99.759293s
   Starting Epoch 4:
   Training loss = 5841.223633, training accuracy = 0.306430
   Evaluation accuracy = 0.099960, Elapsed Time = 99.962356s
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to