zhreshold commented on issue #16708: Training an FPN model using grad_req="add" 
 causes rapid divergence, while manually implemented gradient accumulation 
works fine
URL: 
https://github.com/apache/incubator-mxnet/issues/16708#issuecomment-558876214
 
 
   After digging a while, I found several confusing facts about this bug.
   
   1. @nickguletskii is correct, it's not about `ElementWiseSum`
   2. Duplicating same node with (1, 2, 3, 4, 8, 9, 10...) times, the loss and 
gradients are always GOOD, however, with (5, 6, 7), the gradients will diverge 
at the first iteration
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to