chrishkchris commented on pull request #697: URL: https://github.com/apache/singa/pull/697#issuecomment-637375715
I am using this PR to train Xceptionnet in order to use the save_state function, but I encountered something strange: (i) The training and evaluation were both in https://github.com/apache/singa/pull/651 ``` (singa) dcsysh@panda7:~/singa/examples/autograd$ python3 train.py xceptionnet ci Starting Epoch 0: Training loss = 11198.645508, training accuracy = 0.214420 Evaluation accuracy = 0.309000, Elapsed Time = 606.547117s Starting Epoch 1: Training loss = 6354.611328, training accuracy = 0.381020 Evaluation accuracy = 0.457300, Elapsed Time = 612.817129s ``` (ii) This time I think the training is okay, but something wrong in the evaluation ``` root@e8a757397ca3:~/dcsysh/singa/examples/cnn# mpiexec -np 8 python3 train_mpi.py xceptionnet cifar10 --bs 16 --lr 0.04 --epoch 30 Starting Epoch 0: Training loss = 11614.897461, training accuracy = 0.131190 Evaluation accuracy = 0.099860, Elapsed Time = 98.705291s Starting Epoch 1: Training loss = 6932.552246, training accuracy = 0.157552 Evaluation accuracy = 0.099860, Elapsed Time = 98.400360s Starting Epoch 2: Training loss = 6565.343262, training accuracy = 0.195853 Evaluation accuracy = 0.099960, Elapsed Time = 99.807898s Starting Epoch 3: Training loss = 6173.305176, training accuracy = 0.254467 Evaluation accuracy = 0.099960, Elapsed Time = 99.759293s Starting Epoch 4: Training loss = 5841.223633, training accuracy = 0.306430 Evaluation accuracy = 0.099960, Elapsed Time = 99.962356s ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
