Hi, I think I found an typo in the documentation under http://mxnet.io/architecture/note_engine.html, in the pseudo code for the multiple GPU network:
# aggregate gradient and update fc1_wgrad[cpu] = fc1_wgrad[gpu0] + fc1_wgrad[gpu1] fc2_wgrad[cpu] = fc2_wgrad[gpu0] + fc2_wgrad[gpu1] fc1_weight[cpu] -= lr * fc1_wgrad[gpu0] fc2_weight[cpu] -= lr * fc2_wgrad[gpu0] I think the last two lines should refer to the weights on the 'cpu' instead of weights on 'gpu0', and these wrong lines have also been copied to the picture below the code. Best, Sebastian