luobao-intel edited a comment on issue #12377: Flaky test: test_mkldnn.test_activation URL: https://github.com/apache/incubator-mxnet/issues/12377#issuecomment-416861592 This test is to validate the activation calculation in mkldnn by checking the gradient compared to the theano.gradient.numeric_grad. However, the activation gradient calculation of code referred to theano is not correct with the input closed to zero. Thus, flaky errors occurred when there are some extremely small positive numbers in the random input vector. The experiment is as follows. ## experiment 1: input data :[[1, 2], [3, 0.0001]] location: {'data': <RowSparseNDArray 2x2 @cpu(0)>, '__random_proj': [[0.3546685 0.8954062 ] [0.40476447 0.7724642 ]] <NDArray 2x2 @cpu(0)>} gradient calculation referred to theano : [[0.35466552 0.8954048 ] [0.40476322 0.39395675]] mkldnn : [[0.3546685 0.8954062 ] [0.40476447 0.7724642 ]] ## experiment 2: input data :[[1, -2], [-4, 0.0005]] location: {'data': <RowSparseNDArray 2x2 @cpu(0)>, '__random_proj': [[0.3546685 0.8954062 ] [0.40476447 0.7724642 ]] <NDArray 2x2 @cpu(0)>} gradient calculation referred to theano : [[0.35466552 0. ] [0. 0.4248553 ]] mkldnn : [[0.3546685 0. ] [0. 0.7724642]] ## analysis It's easy to know that the derivative of ReLU function is : if x < 0, output is 0. if x > 0, output is 1. Therefore, in the check_numeric_gradient function, the gradient of executor should be equal to location if the corresponding element of input data is positive and be 0 otherwise by element-wise. The gradient based on theano is apparently false when the corresponding element of input data is close to zero.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services