chrishkchris commented on a change in pull request #468: Distributted module URL: https://github.com/apache/incubator-singa/pull/468#discussion_r317488906
########## File path: python/singa/autograd.py ########## @@ -1286,25 +1287,26 @@ def set_params(self, **parameters): class _BatchNorm2d(Operation): - def __init__(self, handle, name=None): + def __init__(self, handle, running_mean, running_var, name=None): super(_BatchNorm2d, self).__init__(name) self.handle = handle + self.running_mean = running_mean.data + self.running_var = running_var.data - def forward(self, x, scale, bias, running_mean, running_var): - self.running_mean = running_mean - self.running_var = running_var + def forward(self, x, scale, bias): if training: if isinstance(self.handle, singa.CudnnBatchNormHandle): y, mean, var = singa.GpuBatchNormForwardTraining( - self.handle, x, scale, bias, running_mean, running_var + self.handle, x, scale, bias, self.running_mean, self.running_var Review comment: Cannot find CpuBatchNormHandle while the handle used by cpu is BatchNormHandle. So I have used BatchNormHandle instead of CudnnBatchNormHandle. Now Both CPU and GPU can run, but I used "type" instead of "isinstance" because CudnnBatchNormHandle is a subclass of BatchNormHandle, so CudnnBatchNormHandle is considered as an instance of BatchNormHandle in isinstance(). For more explanation, See http://www.runoob.com/python/python-func-isinstance.html Moreover, I have debugged the cpu batchnorm in the following two aspects: (i) the forward function of cpu batchnorm needed the initialization of running mean and var, otherwise when it access the block it returns error as accessing an non-initialized block. I fixed this by initializating the mean by 0 and the var by 1. (ii) the backward function CpuBatchNormBackward does not exist, but there is another function called CpuBatchNormBackwardx (in the directory src/model/operation/batchnorm.cc). So I use this function by providing all the necessary arguments. The program can run now, but I am doing a brief cifar10 training test on AWS (using c5.x4large with 16 cpu cores). ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services