matteosal commented on issue #21111:
URL:
https://github.com/apache/incubator-mxnet/issues/21111#issuecomment-1235710738
@DickJC123 you were in fact right about biased vs unbiased variance
computation. This script tests such claim by letting a non-cudnn batchnorm and
a cudnn-batchnorm update their moving variance, and checking that they are
updated differently and that they respectively correspond to the biased
(non-cudnn) and the unbiased (cudnn) computations:
```
import mxnet as mx
import numpy as np
from mxnet import autograd
print("**** cudnn batchnorm variance")
shapes = {'input': [1, 6, 5], 'gamma': [6], 'beta': [6], 'mean': [6], 'var':
[6]}
# Define batchnorms with identical specs except cudnn_off
# Note that momentum is 0, so moving arrays are replaced everytime with the
latest one
sym1 = mx.symbol.BatchNorm(
*[mx.symbol.Variable(name) for name in shapes.keys()],
eps=0.001,
momentum=0,
fix_gamma=False,
use_global_stats=False,
axis=1,
cudnn_off=True
)
sym2 = mx.symbol.BatchNorm(
*[mx.symbol.Variable(name) for name in shapes.keys()],
eps=0.001,
momentum=0,
fix_gamma=False,
use_global_stats=False,
axis=1,
cudnn_off=False
)
op1 = mx.ndarray.CachedOp(sym1)
op2 = mx.ndarray.CachedOp(sym2)
# Define arrays for op1 and
# They are identical now, but they will be changed differently by the ops
args1 = [mx.np.random.uniform(size=shape, ctx=mx.gpu()) for shape in
shapes.values()]
args2 = [mx.np.array(array, ctx=mx.gpu()) for array in args1]
data, gamma, beta, mean, var = args1
# Evaluation in training mode with backward that rewrites moving mean and var
with autograd.record(train_mode=True):
[arg.attach_grad() for arg in args1]
[arg.attach_grad() for arg in args2]
dummy1 = op1(*args1, default_ctx=mx.gpu())
dummy2 = op2(*args2, default_ctx=mx.gpu())
autograd.backward(dummy1, head_grads=mx.np.ones(shapes['input'],
ctx=mx.gpu()))
autograd.backward(dummy2, head_grads=mx.np.ones(shapes['input'],
ctx=mx.gpu()))
# Check that outputs are the same
print()
print("difference between training mode outputs")
print(mx.np.max(mx.np.abs(dummy1 - dummy2)))
# Check updated moving vars and observe they are different
print()
print("variance updated by the non-cudnn batchnorm")
print(args1[-1])
print("variance updated by the cudnn batchnorm")
print(args2[-1])
# Manually compute biased and unbiased variance
data_mean = mx.np.mean(data, axis=(-1))
data_zeromean = data - data_mean[:, :, np.newaxis]
var1 = mx.np.mean((data_zeromean * data_zeromean), axis=(-1))
var2 = var1 * shapes['input'][-1] / (shapes['input'][-1] - 1)
print()
print("manual biased variance")
print(var1)
print("manual unbiased variance")
print(var2)
```
output is:
```
**** cudnn batchnorm variance
difference between training mode outputs
2.3841858e-07
variance updated by the non-cudnn batchnorm
[0.12171984 0.03338415 0.03920404 0.04988261 0.02153183 0.02420242] @gpu(0)
variance updated by the cudnn batchnorm
[0.15214981 0.04173018 0.04900505 0.06235326 0.02691478 0.03025302] @gpu(0)
manual biased variance
[[0.12171984 0.03338414 0.03920404 0.04988261 0.02153182 0.02420242]] @gpu(0)
manual unbiased variance
[[0.1521498 0.04173018 0.04900505 0.06235326 0.02691478 0.03025302]] @gpu(0)
```
So this shows that:
1) The training mode output is the same between non-cudnn and cudnn
implementations ("difference between training mode outputs"), so they are
computing the data variance in the same way at this step.It can be checked
manually that their result corresponds to using the biased variance
2) However the way the end up changing their moving variance is different.
In particular, the non-cudnn case uses the biased variance as before but the
cudnn case uses the non-biased variance this time. Note that the momentum is
set to 0 for both ops, which means that moving arrays are replaced with the
latest ones, that makes it easy to check the results
3) This explains the numerical error found in my original report. For a
spatial size of 1, the unbiased variance gets multiplied by a factor **1 / (1 -
1) = nan** which would make a subsequent evaluation fail for the cudnn case
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]