DickJC123 commented on a change in pull request #13749: Add NHWC layout support 
to Pooling (cpu, gpu cuda, gpu cuDNN)
URL: https://github.com/apache/incubator-mxnet/pull/13749#discussion_r254117531
 
 

 ##########
 File path: src/operator/nn/pool_utils.h
 ##########
 @@ -98,14 +98,16 @@ struct lp_grad<DType, 1> {
 template<typename DType>
 struct lp_grad<DType, 2> {
   static MSHADOW_XINLINE DType Map(const DType grad, const DType in_data, 
const DType out_data) {
-    return grad * in_data / out_data;
+    // Avoid nan result if both grad and out_data are 0.
+    return (grad == DType(0.0)) ? DType(0.0) : grad * in_data / out_data;
   }
 };
 
 template<typename DType>
 struct lp_grad<DType, 3> {
   static MSHADOW_XINLINE DType Map(const DType grad, const DType in_data, 
const DType out_data) {
-    return grad * in_data * in_data / (out_data * out_data);
+    // Avoid nan result if both grad and out_data are 0.
+    return (grad == DType(0.0)) ? DType(0.0) : grad * in_data * in_data / 
(out_data * out_data);
 
 Review comment:
   I've pushed my solution to your comment in commit 
https://github.com/apache/incubator-mxnet/pull/13749/commits/098bc49f1d288ea9f2b64453aefcc1537ca5254e.
   
   The checking of grad == 0.0 that you highlighted only succeeded because of 
the quirks of our check_consistency() routine in test_utils.py, which uses the 
symbol forward() output as the gradient.  Per your suggestion, I'm now using a 
check of out_data == 0 as the more general way of quieting the test failures.
   
   The test failures I was seeing often occurred in float16 lp-3 pooling.  By 
example, take the case where a pool window of 2 has identical inputs 2^-9 and 
2^-9.  The forward output for this case is the cube root of (2^-9)^3 + 
(2^-9)^3.  If this is calculated in float16, the 2^-27 terms underflow to 0 and 
the output is 0.  The backward output is then grad * 2^-9 * 2^-9 / (0 * 0) = 
+inf (or nan if grad is also 0).  When performed in float32, no underflow 
occurs in the forward op, and +infs are avoided in the backward op.
   
   My conclusion: float16 is ill-equipped to perform the forward pooling 
operation for lp-2 and lp-3.  Part of my solution here thus involves promoting 
the calculation to be in float32 for cpu and mxnet cuda implementations of 
float16-i/o pooling.  This is consistent with other operators like float16-i/o 
Convolution and Batchnorm, which perform internal calculations in float32.  
I've run test_pooling_versions() thousands of times with no failures in this 
mode.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to