chrishkchris commented on a change in pull request #468: Distributted module
URL: https://github.com/apache/incubator-singa/pull/468#discussion_r317488906
 
 

 ##########
 File path: python/singa/autograd.py
 ##########
 @@ -1286,25 +1287,26 @@ def set_params(self, **parameters):
 
 
 class _BatchNorm2d(Operation):
-    def __init__(self, handle, name=None):
+    def __init__(self, handle, running_mean, running_var, name=None):
         super(_BatchNorm2d, self).__init__(name)
         self.handle = handle
+        self.running_mean = running_mean.data
+        self.running_var = running_var.data
 
-    def forward(self, x, scale, bias, running_mean, running_var):
-        self.running_mean = running_mean
-        self.running_var = running_var
+    def forward(self, x, scale, bias):
         if training:
 
             if isinstance(self.handle, singa.CudnnBatchNormHandle):
                 y, mean, var = singa.GpuBatchNormForwardTraining(
-                    self.handle, x, scale, bias, running_mean, running_var
+                    self.handle, x, scale, bias, self.running_mean, 
self.running_var
 
 Review comment:
   Cannot find CpuBatchNormHandle while the handle used by cpu is 
BatchNormHandle. So I have used BatchNormHandle instead of CudnnBatchNormHandle.
   
   Now Both CPU and GPU can run, but I used "type" instead of "isinstance" 
because CudnnBatchNormHandle is a subclass of BatchNormHandle, so 
CudnnBatchNormHandle is considered as an instance of BatchNormHandle in 
isinstance(). For more explanation, See 
http://www.runoob.com/python/python-func-isinstance.html
   
   Moreover, I have debugged the cpu batchnorm in the following two aspects:
   (i) the forward function of cpu batchnorm needed the initialization of 
running mean and var, otherwise when it access the block it returns error as 
accessing an non-initialized block. I fixed this by initializating the mean by 
0 and the var by 1.
   (ii) the backward function CpuBatchNormBackward does not exist, but there is 
another function called CpuBatchNormBackwardx (in the directory 
src/model/operation/batchnorm.cc). So I use this function by providing all the 
necessary arguments.
   
   The program can run now, but I am doing a brief cifar10 training test on AWS 
(using c5.x4large with 16 cpu cores).  
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to