TristonC edited a comment on issue #18751:
URL: 
https://github.com/apache/incubator-mxnet/issues/18751#issuecomment-663168367


   I am not sure that putting the running mean and running var into the 
backward pass is the solution. It can be achieved by setting 
autograd.record(train_mode=False). The problem here  (NaN) is the way of 
computing running var, be it un-biased (divided by m -1 and as shown in the BN 
paper) or biased (by m). This is a corner case while m is 1 when the batch size 
is one and BN is after dense (or flatten) layer.  My question is whether having 
 9408 or 32 running means and running vars really the network trying to do. 
BTW, the NaN is something frustrating for end users, we need to find a way to 
solve it. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to