made a PR that addresses just this test (ran 10000 times with different seeds as well) https://github.com/apache/incubator-mxnet/pull/12560.
in regards to @luobao-intel , this is not due to inputs being too large. activation is linear above 0 so this is not due to lack of approximation. in fact we should be able to get an exact solution. the reason the change is causing an error is the fact that with a very small eps the outputs (f(x + eps/2) and f(x - eps/2)) do not have enough precision. the formula is ``` grad = (f(x + eps/2) - f(x - eps/s)) / eps). ``` since eps was 1e-6 this means the gradient was calculated by differences must be captured below 1e-6. [ Full content available at: https://github.com/apache/incubator-mxnet/issues/12377 ] This message was relayed via gitbox.apache.org for [email protected]
