[GitHub] larroy commented on issue #9853: Flaky test_operator.test_binary_op
larroy commented on issue #9853: Flaky test_operator.test_binary_op URL: https://github.com/apache/incubator-mxnet/issues/9853#issuecomment-397011088 1060292419 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] larroy commented on issue #9853: Flaky test_operator.test_binary_op
larroy commented on issue #9853: Flaky test_operator.test_binary_op URL: https://github.com/apache/incubator-mxnet/issues/9853#issuecomment-397011088 seed 1060292419 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] larroy commented on issue #9853: Flaky test_operator.test_binary_op
larroy commented on issue #9853: Flaky test_operator.test_binary_op URL: https://github.com/apache/incubator-mxnet/issues/9853#issuecomment-396886126 I have likely found the root cause of this problem, just so we don't duplicate resources on this one. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] larroy commented on issue #9853: Flaky test_operator.test_binary_op
larroy commented on issue #9853: Flaky test_operator.test_binary_op URL: https://github.com/apache/incubator-mxnet/issues/9853#issuecomment-386885128 reproducible 100% with export MXNET_TEST_SEED=1688524483 nosetests-3.4 -s -v test_operator_gpu.py:test_binary_op diff --git a/tests/python/unittest/test_operator.py b/tests/python/unittest/test_operator.py index 5d38222..04e880c 100644 --- a/tests/python/unittest/test_operator.py +++ b/tests/python/unittest/test_operator.py @@ -1429,6 +1429,16 @@ def check_binary_op_backward(symbol, baseline, gen_data, rtol=1e-3, atol=1e-5): y.forward(is_train=True) y.backward([mx.nd.array(out)]) assert_allclose(y_1.asnumpy(), x_1, rtol=rtol, atol=atol) +z = np.abs(y_2.asnumpy() - x_2) +w = np.where(z>atol) +if w[0].size > 0: +print("d[0].shape: {} d[1].shape: {} baseline_grad2.shape: {}".format(d[0].shape, d[1].shape, baseline_grad2.shape)) +print(w) +print(y_2[w]) +print(x_2[w]) +print(z[w]) +print(d[0][w]) +print(d[1][w]) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] larroy commented on issue #9853: Flaky test_operator.test_binary_op
larroy commented on issue #9853: Flaky test_operator.test_binary_op URL: https://github.com/apache/incubator-mxnet/issues/9853#issuecomment-386885128 reproducible 100% with export MXNET_TEST_SEED=1688524483 nosetests-3.4 -s -v test_operator_gpu.py:test_binary_op ``` diff --git a/tests/python/unittest/test_operator.py b/tests/python/unittest/test_operator.py index 5d38222..04e880c 100644 --- a/tests/python/unittest/test_operator.py +++ b/tests/python/unittest/test_operator.py @@ -1429,6 +1429,16 @@ def check_binary_op_backward(symbol, baseline, gen_data, rtol=1e-3, atol=1e-5): y.forward(is_train=True) y.backward([mx.nd.array(out)]) assert_allclose(y_1.asnumpy(), x_1, rtol=rtol, atol=atol) +z = np.abs(y_2.asnumpy() - x_2) +w = np.where(z>atol) +if w[0].size > 0: +print("d[0].shape: {} d[1].shape: {} baseline_grad2.shape: {}".format(d[0].shape, d[1].shape, baseline_grad2.shape)) +print(w) +print(y_2[w]) +print(x_2[w]) +print(z[w]) +print(d[0][w]) +print(d[1][w]) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] larroy commented on issue #9853: Flaky test_operator.test_binary_op
larroy commented on issue #9853: Flaky test_operator.test_binary_op URL: https://github.com/apache/incubator-mxnet/issues/9853#issuecomment-386885067 I tried to increase the tolerance, but I found out one failure where the difference is much bigger than expected 0.28679015 . I think we should look deeper into this [-116.15162] <-input [-115.8648288] <- gradient [0.28679015] <- diff [0.8396868] <- a [0.0020733] <- b FAIL This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] larroy commented on issue #9853: Flaky test_operator.test_binary_op
larroy commented on issue #9853: Flaky test_operator.test_binary_op URL: https://github.com/apache/incubator-mxnet/issues/9853#issuecomment-386875707 I think the cause of this is that operator mod is using doubles to make the computation, while the test is forcing float32, also the modulo operator for floating point seems to give different results in GPU vs CPU. Why is fmod in cuda giving different results? According to table 7 here https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#introduction-cuda-dynamic-parallelism there should be no differences in fmod. https://github.com/apache/incubator-mxnet/blob/master/tests/python/unittest/test_operator.py#L1511 https://github.com/apache/incubator-mxnet/blob/master/src/operator/mshadow_op.h#L402 >>> np.double(1.68) % np.double(1.30123) 0.378769983 >>> np.float32(1.68) % np.float32(1.30123) 0.37877 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] larroy commented on issue #9853: Flaky test_operator.test_binary_op
larroy commented on issue #9853: Flaky test_operator.test_binary_op URL: https://github.com/apache/incubator-mxnet/issues/9853#issuecomment-386875707 I think the cause of this is that operator mod is using doubles to make the computation, while the test is forcing float32, also the modulo operator for floating point seems to give different results in GPU vs CPU. https://github.com/apache/incubator-mxnet/blob/master/tests/python/unittest/test_operator.py#L1511 https://github.com/apache/incubator-mxnet/blob/master/src/operator/mshadow_op.h#L402 >>> np.double(1.68) % np.double(1.30123) 0.378769983 >>> np.float32(1.68) % np.float32(1.30123) 0.37877 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] larroy commented on issue #9853: Flaky test_operator.test_binary_op
larroy commented on issue #9853: Flaky test_operator.test_binary_op URL: https://github.com/apache/incubator-mxnet/issues/9853#issuecomment-386875707 I think the cause of this is that operator mod is using doubles to make the computation, while the test is forcing float32 https://github.com/apache/incubator-mxnet/blob/master/tests/python/unittest/test_operator.py#L1511 https://github.com/apache/incubator-mxnet/blob/master/src/operator/mshadow_op.h#L402 This discrepancy could be the cause of difference between results. >>> np.double(1.68) % np.double(1.30123) 0.378769983 >>> np.float32(1.68) % np.float32(1.30123) 0.37877 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services